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PREFACE 


P 

This textbook is for use in either a one-semester statistics course or a two- 
semester sequence or statistics courses in education and the social sciences. 
Chapters 1 through 8 can be covered thoroughly in a one-semester course 
meeting three hours weekly, with time left for sampling topics from Chapters 
9 and 14. Intensive study of Chapters 10 through 19 in a second course 
would constitute thorough preparation in the fundamentals of the inferential 
statistical techniques most useful for research in education, psychology, 
sociology, and other social sciences. 

In writing this text, we sought to produce a more thorough coverage 
of analysis of variance techniques than now appears in basic statistics texts 
for social scientists In addition, an attempt was made to cover correlational 
techniques comprehensively while emphasizing their intei relationships. We 
recognize only too clearly the possible shortcomings of this work; but at 
the same time, we respectfully offer it to the profession with some small 
measure of pride. 

The generosity of publishing companies in allowing the reproduction of 
scientific material marks them as one of a rapidly dwindling brotherhood of 
altruistic agents in our society. The publication of this book — and probably 
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any other contemporary statistics textbook— 'would not have been possible 
without the cooperation of several publishers. We are indebted to the 
following: the Literary Executor of the late Sir Ronald A. Fisher, F.R.S., 
Dr. Frank Yates, F.R.S., and Oliver & Boyd Ltd., Edinburgh, for their 
permission to reprint tables from their books Statistical Methods for Research 
Workers and Statistical Tables for Biological , Agricultural and Medical 
Research; Biomeirika Trustees; Charles Griffin & Company Ltd.; Cambridge 
University Press; Chandler Publishing Company; Annals of Mathematical 
Statistics; McGraw-Hill Book Company; Prentice-Hall, Inc.; The RAND 
Corporation; The Free Press; The Psychometric Society; The Psychological 
Corporation; Psychological Bulletin; The National Industrial Conference 
Board. 

During the five years while this text was being written, our colleagues 
and students contributed in innumerable ways to our efforts. We cannot 
name them all here, but the following colleagues deserve special thanks for 
contributing recently to our education in statistical methods: Frank B. Baker, 
R. Darrell Bock, Raymond O. Collier, Leon J. Gleser, Chester W. Harris, 
J. Thomas Hastings, John L. Horn, Henry F. Kaiser, William Kruskal, 
Leonard A. Marascuilo, Leslie D. McLean, Donald L. Meyer, Ellis B. Page, 
Robert M. Pruzek, Ronald G. Ragsdale, Robert E. Stake, George C. Tiao, 
and David E. Wiley. Several persons assisted in various ways during their 
graduate training: Alan Abrams, Donald Bosshart, Glenn Bracht, Russell 
Chadbourn, James Collins, Ralph Hakstian, Thomas Maguire, Masahito 
Okada, Perc Peckham, Andrew Porter, Robert Smith, and Peter Taylor. 
The assistance rendered py four persons merits special mention. Jason 
Millman and Kenneth D. Hopkins graciously consented to the inclusion 
of materials which they earlier prepared with the first author for classroom 
use. Robert Mendro assisted in the preparation of solutions for the problem 
sets which follow each chapter. Marilyn D. Wang made excellent detailed 
suggestions for improving much of the book. The task of typing the various 
drafts of the manuscript was shared by Ann Beadleston, Harriet Clutterbuck, 
Linda Schmale, and Linda Venter. 

GENE V GLASS 

JULIAN C. STANLEY 
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Persons beginning the stud) T|) should realize that non- 
popular image of statistic* i d ra „ £ cxpr essed quantitatively. 

A* » *** safeguard against an uncritical 
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acceptance of verbal nonsense, and a knowledge of statistics is Ac best 
defense against quantitative nonsense. The Srststep toward replacing popular 
Images of statistics with more realistic ones is the study of the structure 
the discipline of “statistical methods" and its historical antecedents. 


There were two widely divergent influences on the early development 
of statistical methods. Statistics had a mother who was dedicated to keeping 
orderly records of governmental units {state and srarisfics come from the 
same Latin root, status) and a gentlemanly gambling father who relied on 
mathematics to increase his skill at playing the odds in games of chance. 
From the mother sprang counting, measuring, describing, tabulating, 
ordering, and the taking of censuses — all of which led to modern descriptive 
statistics. From the adventurous, intellectual father eventually came 
modern inferential statistics, which is based squarely on theories of prob- 
ability. A recent addition called the design of experiments relies heavily on 
a combination of probability theory and rather elementary but uncommon 
logic. This te*t offers an introduction to and a useful explanation of de- 
scriptive statistics, inferential statistics, and the design of experiments. 
Chapters 2 through 9 cover a large portion of descriptive statistics. Be- 
ginning with “Probability” in Chapter 10 and extending through Chapter 14, 
several topics from inferential statistics are covered. Chapters 15 through 19 
present the considerations and inferential techniques fundamental to the 
design and analysis of experiments. 

Descriptive statistics involves the tabulating, depicting, and describing 
of collections of data. These data may be either quantitative, such as 
measures of height and weight, or qualitative, such as sex and personality 
type. Large masses of data must generally undergo a process of summari- 
zation or reduction before they are interpretable by the human mind. A 
monkey is unsuccessful in his clumsy attempt to untie a simple knot because 
the complexity of the problem of untying the knot surpasses the resolving 
power of the poor creature's intellect. The unsuccessful fumbling attempt 
of a fisherman to unsnarl 3 backlash in his fishing reel is analogous to the 
monkey's plight. For the fisherman, that backlash is a Gordian knot; it 
presents too great a problem for his finite intellect. Similarly, but at a 
different level, the human mind cannot extract the full import of a mass of 
data (How do they vary? About how large arc they? Is one set useless in 
reducing uncertainty about the other?) without the aid of special techniques 
(swords to cut the Gordian knot). Thus descriptive statistics serves as a 
tool to describe or summarize or reduce to manageable form the properties 
of a maw of data. * * 


This 


I.fimlial italhitcj isa forimliied body or techniques Tor solving another 
, or problems Dm present great difficulties for the unaided human mind, 
general class or problem, characteristically involves attempts to infor 
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the properties of a large collection of data from inspection of a sample of 
the collection. For example, a school nurse wishes to determine the pro- 
portion of children in the fifth grades of a large school system who have never 
had chicken pox. It would be unnecessary to question each child if the 
proportion could be reliably estimated from a sample of as few as 100 
children. But how does the proportion of children in the sample of 100 who 
haven’t had chicken pox relate to the analogous proportion in the entire 
fifth-grade population? The answer can be obtained through inferential 
statistics. Thus the purpose of inferential statistics is to surmise the prop- 
erties of a population from a knowledge of the properties of only a sample 
of the population. Inferential statistics builds upon descriptive statistics. 
The inferences are drawn from particular properties of samples to particular 
properties of populations; the descriptions of the properties of both the samples 
and the populations are obtained by methods of descriptive statistics. 

The design and analysis of experiments is a third important branch of 
statistical methods. These methods were developed for the discovery and 
confirmation of causal relationships among variables. Researchers in the 
social sciences are concerned with causation, a very complex concept in 
philosophy. Experimental design is so important for the study of causal 
relationships that in some philosophical systems an experiment constitutes 
an operational definition of a causal relationship. Adults make causal 
inferences during all of their waking moments. The frequent use of the word 
“because” reveals this: "The school bond failed to pass because it was not 
well publicized,” or “He scored poorly on the intelligence test because he 
was overly anxious about the consequences of the score.” 

The sentence, “Drug A kills pain faster than Drug B,” does not contain 
the word “because,” but it implies that “More of this group of patients than 
of that group gained fast relief from pain because Drug A was administered 
to the former, whereas Drug B was administered to the latter.” The weakness 
of the “because” explanation is its potential vagueness. This weakness is 
betrayed by the favorite remark of many young children when, at the pre- 
logical stages of their thinking, they are confronted with evidence of their 
misbehavior. If asked, “Why did you do that?” they respond, “Just 'cause." 
Obviously, the word has many denotations and connotations. 

Statistical methods assist researchers in describing data, in drawing 
inferences to larger bodies of data, and in studying causal relationships. 
They can be a useful too! in answering such questions as the following: 
How old is the average man on the day he receives a Bachelor of Arts degree 
from a certain college? What percentage of these new graduates have blue 
eyes? What percentage of them are presently married? How many of them 
already have 0, 1,2, . . . children? Do those who earned good grades as 
undergraduates attend graduate schools in greater proportions than those 
who earned mediocre grades? Has the international situation affected 
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attendance at graduate school in an aliment, will college students 
who are received with friendliness by a group conform more to that group s 
judgments than will college students shunned by a group? Is this differential 
reaction, if found, contingent on the sex of the student 7 For instance, are 
women more or less amenable than men to group influence? 

Mastering statistical methods requires some mathematical skilly The 
subject, statistics, is a branch of applied mathematics. Statistics is inade- 
quately described in the dictionary as "the science of compiling facts. If 
statistics were only that simple, this text would be short indeed. In its more 
rigorous form, statistics is usually called mathematical statistics. For social 
scientists and other nonmathematicians it is termed "applied statistics 
and includes much use of intuition, simple arithmetic, and elementary 


algebra. To study mathematical statistics seriously, one needs a background 
including at least advanced calculus and matrix theory ; however, mudt of the 
rationale of applied statistics and many of its techniques can be learned 
without this mathematical maturity, although of course not as deeply 
understood. Perhaps this is partly the reason why various social sciences tend 
to be technique oriented. In large universities, separate courses in "educa- 
tional and psychological statistics," "sociological statistics," "economic 
statistics,” and the like can usually be found outside the Department of 
Statistics. Fortunately, however, the most fundamental principles are useful 
in nearly all disciplines, from agriculture to zoology. A knowledge of 
statistics is becoming necessary for the pursuit of a career of scholarship in 
any empirical discipline. Many graduate schools have recently acknowledged 
its importance by accepting course work in statistics as a replacement for 
one of the two foreign language courses that are traditionally required for the 
PhD degree. The substitution is apv. statistics is an increasingly important 
means ofcommunicating knowledge. The increasing recognition of statistics 
as a tool of scholarship brings to mind the description of the education of 
children in B. F. Skinner’s utopian community, Walden Two: “We help them 
in every way short of teaching them. We give them new techniques of 

acquiring knowledge and thinking We give them an excellent survey of 

the methods and techniques of thinking, taken from logic, statistics, scientific 
method, psychology, and mathematics. That’s all the ‘college education’ 
they need . They get the rest by themselves in our libraries and laboratories." • 
The word statistic is defined by Kendall and Buckland (1957)+ as "A 
summary value calculated from a sample of observations, usually but not 
necessarily as an estimator of some population parameter; a function of 


T “’- N ” Y “ k: Th ' Com P*”y (pa^rback 

,ho ilo'f 32 " ”* »■* comuli it* bib.iofr.ph, 
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sample values.” The contrasting term is “parameter,” which we shall define 
later. Thus, the arithmetic mean of the numbers !, 4, and 4, which is 3 isa 
statistic. The fact that a certain man has 2 children is a datum, whereas the 
average number of children in a town is a statistic. (You can actually see 
those 2 children, but not the average child.) This distinction between 
“statistic” and “datum” is not always preserved, however. Some applied 
statisticians and researchers use “statistic” to cover both, even saying that 
a person’s name or hair color is a statistic. 

Individual statistical techniques are unified by an underlying method. 
We shall attempt to show this unity and the interrelations as clearly as possible 
by using only the elementary mathematics that those studying this book will 
normally have learned in secondary school. Some special symbols will be 
introduced as needed; these will be explained carefully. They must be 
mastered at the point of introduction because thinking in statistics is 
facilitated by such symbols. 

The approach of this text is from descriptive to inferential, with the 
statistics and logic of controlled experimentation gradually introduced. 
Two goals toward which this textbook is directed are (1) ability to read 
reports of surveys, studies, investigations, and experiments in your sub- 
stantive field with moderate competence (given that you understand the 
substantive problems being researched) and (2) appreciable ability to plan 
your own studies and analyze data resulting from them. 

The amount you learn will depend on your quantitative aptitude and 


diligence, plus the efforts of your instructor. The range of acquistion of 
statistics during an academic year is usually great for the students in a given 
class. Some develop considerable expertise, and others build a sound basis 
for further study in class and out. A few students, especially those trau- 
matized by “symbol shock,” find the pace too fast and the explanations too 
scanty. If they are to avoid a life of stuttering every time they even try to 
pronounce “statistics,” their best hope is a skilled tutor to ease their anxiety, 
help them develop arithmetic and symbolic competence, and promote over- 
learning of the basic aspects needed for subsequent sections of the book. 
The sooner such a tutor is acquired the better. 

If you have not studied mathematics, logic, or any other rigorous and 
deductive body of knowledge for some time, you may find studying statistics 
uncomfortable for a while. In many disciplines characterized by vague 
verbal discourse and personalistic use of language, a student can sustain 
sloppy and erroneous thinking for long periods without be.ng confronted 
with its inadequacy. It is simply impossible to refute the statement that 
“We must educate the whole child” or that “Both heredity and environment 
are important in determining human intelligence.” However the student of 
statistics is likely to be confronted abruptly and uncomfortably with the 
results of loose thinking, as when a calculated quantity that cannot possibly 
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be negative resists one’s best attempts to male it come out positive. If you 
are inclined toward critical and precise thought, this restrictive and confining 
mantle will soon begin to feel comfortable. The satisfying reassurance of 
knowing that you are mastering a logical and unambiguous language will 
outweigh the occasional pang of anxiety produced when you are discovered 
speaking iltogically. Being openly and dearly wrong is the price we must 
pay for knowing when we are correct. Never to know whether you are 
speaking sense or nonsense is too expensive a luxury to enjoy in an age 
when sense is precious and nonsense is rampant. 

It is easy to succumb to the delusion that you are learning a great deal 
about statistics from simply reading the text. A statistics text is not a novel; 
none will eveT become a ‘‘Book-of-the-Monlh Club” selection. This 
statistics text must be studied carefully and thoughtfully. Above all, the 
exercises and problems that follow each chapter must not be slighted. 
Working these exercises will put a fine edge on your knowledge of the subject. 
Skip the exercises and you may never know what you don't know about 
statistics. In statistics, as in most human endeavors, “A little learning is a 
dangerous thing; drink deep, or taste not the Pierian Spring. , . 

Almost anyone can learn a great deal about statistics. Even many 
high-school students find it interesting, so do not think it accessible only to 
the specially appointed. A good textbook helps; in the hands of a gifted 
teacher it is doubly effective. While writing this book we have held in mind 
three functions of a textbook: it must be an effective pedagogic instrument; 
it must serve as a reference work once the material is learned; it must serve 
to direct the student toward the larger body or knowledge that it only 
samples. Clearly the first function ts primary, for there is no point in re- 
ferring to what is not known nor should one attempt to read more advanced 
material until the fundamentals have been mastered. Master the funda- 
mentals now and use them the rest of your life. 




Alexander Pope, Essay on Criticism (1711). 
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measurement 


in slichtly different terms from 
Everyone chooses to define js common to all definitions seems to 

slightly different points of vie • of numbi . rs things according to 

be this: Measurement ts ? assign a number to the d, stance 

rules. To measure a person , s height^ ^ rf u , fect wlth the v c 
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of a ruler. Measurement of a . ™ s to „ g „, u p of standard problems, 

the pattern of response that oHributes „f our perceptions into familiar. 
Measurement transforms ,, wbat an impossible world it wou 
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if we did not measure! or , P a traV eler that Chicago is down 
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MEASUREMENT SCALES 

The ideas of “scales of measurement" constitute a useful set of concepts. 
Behavioral scientists, and very few other scientists, have been concerned 
with these problems. %Ve shall now discuss briefly the different scales and 
their implications for statistics. 


Nominal Measurement* 

Nominal measurement (giving a name or names) scarcely deserves to be called 
“measurement.** It is the process of grouping objects into classes so that 
all of those in a single class are equivalent (or nearly so) with respect to some 
attribute or property. The classes are then given names; that the classes 
could as well receive, and often do receive, numerals for identification 
instead of names may account for the title “nominal measurement ” Classifi- 
catory schemes in biology are examples of nominal measurement. Psychol- 
ogists often code “sex** by assigning 0 to “female” and 1 to “male”; this is 
nominal measurement, too. We would perform nominal measurement if we 
assigned I to Englishmen, 2 to Germans, and 3 to Frenchmen. Does one 
Englishman plus one German equal one Frenchman (1 -p 2 = 3)7 
Obsiously not. The numerals we assign in nominal measurement have all 
the properties of any other numerats. We can add them, subtract them, 
divide them, or simply see which is larger than another. But if the process 
by which we assigned the numerals to objects was nominal measurement, 
then our playing with the size, order, and other properties of the numerals 
will imply nothing at all about the objects themselves because we took no 
cognizance of the size, order, and other properties of the numerals when we 
assigned them. When measurement is nominal only, one uses only the 
property of numbers that 1 is distinct from 2 or 4 and that if object A has a 
1 and object B a 4, then A and B are different with respect to the attribute 
measured. U does not necessarily follow that B has any more of the attribute 
than A. The three remaining scales of measurement we shall encounter 
make use of three additional properties of numbers; numbers can. be ordered 
with respect to size; they can be added; and they can be divided. 

Ordinat Measurement 

Ordinal measurement is possible when the measurer can detect differing 
*f rcc ’ of an altnbole or property in objects. This bein 5 possible, he mates 

•Lie *" *” J the cooetpu it 
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use of the “orderedness” P r0 P“‘J ^ number assigned 
so that if the number asstgned or *. * « 1 “ E tio „ tha n B. 

to B, then A possesses more of th P P ^ M q ary _ Janc , A i ice , and Betty 
Suppose we ask a person Wc ma „ ran ]c them as follows: 

from least beautiful to most >«« • .„. , „ y ccurs wh en we assign the 

Betty, Jane, Mary, Alice. Or > and Alice respectively. Notice 

numbers 1 , 2 , 3 , and 4 <° B^- J * ' ^| d h a ve served just as well since the 
that the numbers 0, 23, 49, and bcrs is of no significance. We may 

distance between any two ad i a “ n ' ^ discm ,i n g, for example, whether 

not feel that the measurer was “ sses *d by Betty and Jane is 

the difference between the i amount i afto JT Jane and Mary . Hence, 

greater or less than the difference mbeauty ^ ^ ^ difr£reIlc e fctween 

no significance should be attach^ ^ (h£ distance Between Mary s and 
Betty’s and Jane’s scores is 

Alice’s scores. lhe lace of the objects of concern. e 

Notice how the numbers take h P ^ . we agree to treat the 

numbers are a partial "P 1 **"*'”" > ^ ^ , h£y bc ordered 

as ;f both the facts that w y h numbers constitute 

aretm orTait At the Instead of having 

something of a reduction in effort t £ tiful Jan c next least Mary 

* 


jjetiy 

A scale of hardness of minerals » number. Suppose 

^rr?c;:ndrhre'-n rt =^ 

z&sz&gSSsz-tfz 

are assigned from > .foe ^ ^ (lf> for example, the three t p 

studerUs 'all had^rfect 2° andT'Tte means of resolving 

the average of tne w 
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ties is convtnnoml. because « keeps *“ m ° r lhc ,icJ ” nJ “ nticd ra " ks 
the same: I+2+3* a 2 + 2 + 2.) , , . . . 

There is no law preventing one from adding, subtracting, multiplying, 
etc numbers that have been assigned to objects by ordinal measurement. 
However, the results or these operations may reflect nothing about the 
amounts or the properly in question that the objects corresponding to the 
numbers possess. For example, the diUerencc between the "beauty scores 
of Alice and Betty is 3; the difference between the scores ol Mary and Jane 
is I. Does this mean that the difference in beauty between Alice and Betty 
is ihrcc times as great as the difference in beauty between Mary and Jane . 
Of course it doesn't. The results of the arithmetic cannot be interpreted 
as saying anything about the amounts of the property actually possessed by 
the objects. You can do what you wish with the numbers you obtain, but 
you are always faced with the question, the ttsuViS c fl these oywitioRS 

have any meaning for me?" 


Interval Measurement 

Interval measurement is possible when the measurer can distinguish not only 
between different amounts of the property in objects (the characteristic of 
ordinal measurement) but can also discern equal differences between objects. 
For interval measurement a unit of measurement (degree, inch, foot, ounce, 
etc.) has been defined. A number is assigned to an object that equals the 
number of units of measurement equivalent to the amount of the property 
possessed. For example, the temperature of a certain metal bar is 86° 
centigrade. An important feature which distinguishes interval measurement 
from ratio measurement (to be studied next) is that an object with a measure- 
ment of zero does not necessarily lack the attribute being measured. Hence, 
water at O’C is not absolutely without temperature. The zero point on an 
interval scale is an arbitrary one. 

The numbers assigned in the process of interval measurement have the 
properties of distinctness and order, but in addition the difference between 
the numbers is meaningful. The number assigned to the object is the number 
of units of measurement it has. Today's temperature is 60°F; yesterday’s 
temperature was 55’F. Today is 5’ warmer than yesterday. If tomorrow's 
temperature is 70’ F then we know that yesterday and today arc more alike 
in temperature than are today and tomorrow. The difference between 55 
and 60 is half as large as the difference between 60 and 70; furthermore, the 
sizes of these differences tell us something about the temperature of the air. 

The numbering of the years is an interval scale. The year 1 was arbi- 
tranly set originally as the year of the birth of Christ. The unit of measure- 
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ml divs The year 1931 is more recent than any other 
ment is a span of 365J day ■ ? ^ iime between 1776 and 1780 equals 

year with a smaller m™** . ^ 19 -) 4 James k. Polk was president 

the length of h* “ " Eisenb ower (,953-1961). 
for half as long (1845 IS ) 6 ; j numb ers to objects in such a 

Interval measurement involves assjgn^ S ^ (o cqua , differences 

way that equal differences property or attribute measured. The 

can be placed arbitrarily and does no. indicate 
absence of the property measured. 


Ratio Measurement 


, differs from interval measurement only in that the zero 
Ratio measurement dl ^ rs "° lotal absence of the property measured 
point is not arbitrary buU "““‘“ „ of lhe property, and he has a unit 
The measurer can P' rce ' va recor ds differing amounts or the property, 
of measurement with whi assigned in measurement reflect equal 

Equal differences between the ® ossessc d by the things measured, 

differences in the amount of the p^ pe J ^ absoluttt it is meamng- 

Furthermore, since the zer p ^ four |imcs as much of the properly as B. 
ful to say that A h“ l "®> ’ pks „f ratio measurement scales. Zero 

Height and weigh aIC ^ six . fec , tail is twice as tall as a three-foot 
height is no height at all, a t^use the ratios of numbers on a ratio 

boy. The ratio scale 1S so " a ™, ios can be interpreted as ratios of amoun t 
scale are meaningful. Thes * jo statemalt about a strictly interval seal' 
of the objects mrasured ; the arno unts of the attribute in the ° b J'' ts - 
has no meaning in term * d a h J ah temperature of 90"F and March ,17 had a 
correctly that June 3 had twice as much temper- 
ature as March 17. ducation al research and in the behavioral sciences 

Most measurement me levels. Few important variables 

occurs at the nominal, o ' , v£s to rat io measurement; in fact, one 
in these fields as yet en measurement that will satisfy e 

must search diligently » siona „ y , r a,io-sea.e variables such as 

conditions of an interval ' a list „f words), height, we.ght, or distance 
time (to solve a prob Ln oStsions arise infrequently. You must undertake 

will be of interest, but such ina , and ordinal levels and prepare 

,0 recognize measurement at in ion or such data present. 

what has bKn said ,hus 

Table 2.1 summarizes anu , t 

about scales of measurement. 
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TABLE 2.1 SUMMARY OF CHARACTERISTICS AND EXAMPLES OF MEASUREMENT SCALES 
Scale Characteristics Examples 


Nominal 


Ordinal 


Interval 


Ratio 


Objects are classified and classes are denoted by 
numbers. That the number for one class is 
greater or less than another number reflects 
nothing about the properties of the objects other 
than that they are different. 

The relative sires of the numbers assigned to 
the objects reflect the amounts of the attribute 
the objects possess. Equal differences bet* ten the 
numbers do not imply equal differences in the 
amounts of the attributes. 

A unit of measurement exists by which the objects 
not only can be ordered but may also be assigned 
numbers so that equal differences between the 
numbers assigned to objects reflect equal differ- 
ences in the amounts of the attribute measured. 
The zero point of the interval scale is arbitrary 
and docs not reflect absence of the atlribute. 

The numbers assigned to objects have all the prop- 
em« of those of the interval scale, and in addit ion 
an absolute zero point exists on the scale. A 
measurement of 0 indicates absence of the prop- 
erty measured. Ratios of the numbers assigned 
in measurement reflect ratios in amounts of the 
property measured. 


Racial origin, eye cotor, 
numbers on football 
jerseys, sex, clinical 
diagnoses, automobile 
license numbers, social 
security numbers 
Hardness of minerals, 
grades for achievement, 
ranking on a person- 
ality trait, military 
ranks 

Calendar lime, Fahrcn- 
heil and centigrade 
temperature scales 


Height, weight, numer- 
osity, time, tempera- 
ture on the Kelvin 
(absolute zero) scale 


SSSgSSSS £SS£ 

you TXSSZZZ&s? wdl as ,hey ' 50 * e rttom ™ d that 

Read such works as the follow' • C P ass judgment on their position, 
the validity of the above concents^ ^ b ° th J ro and con opinions about 
will be beyond the crasn of th^tv • P/”? m ' d - Several of these articles 

-SM « - be oveLL a, "*» 

logical Bulletin, 196^ 58?No! 30(1 non P aramc,ri c" Psycho- 

Kaiser, H. F., “Review nf 

Psychometrika. I960, 25,411-13^1^ .Stfo/sS ^ Vir S inia Scnder »." 
oftheposmon held by Stevens and faST text is highly critical 

SS« a h“u COgCTtar ^^ ta g a >nstth c Psyehometrician and 

dictates which statistics can be Jed? “ that ones of measurement 
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Senders, Virginia L . W '“ I | “" “IfganSd around* Spins' concepts; her 
York, 1958). Th.s textbook ^ rs organ™ h ologists. 

position is one of the more ex Hi „ Neiv York, 1956). Siegel's 

Siegel, S„ Nonparametric !*l‘ evE „ 5 Siegel's text emphasizes which 

position is identical to •hf t ° f S, ' „ w £ ich scales of measurement, 
statistical techniques are appropria hasis on "permissible and 

Although a useful text in many respects. p 

“appropriate” statistics is per ap psychophysics,” in Handbook 

Stevens, S.S. (Ed.), “MatheMtics^rowsurement and ^ This early 

of Experimental Psychology * .j c ’ m 0 f measurement scales and precipitated 
article revived interest in me p 
debate on the issue. 

. fenm the works of Stevens, Senders, 
You are likely to get the way a "scale” underlies certain 

Siegel, and others that in s« ^ ned , 0 a group of objects definitely 

attributes. A certain set of num ° hc sca)e is either nominal, ordinal, 

fits into this one “tegory or t betw , e „. This position ca „ i ea d 

interval, or ratio; and ' lh "‘ ' f the les s well organized feelings of those who 
to chaos if held to m the fa educa tional measurements. Those in 

actually perform psychologi le _ , hat ,q scores form an ordinal 

the Stevens camp maintain, i „ itica i acceptance of this decree forces one 
scale, not an interval scale. de of the differences between IQ scores, 

to disregard completely the ^ ^ ]Q of 110j and Bob an IQ of 112. 
Suppose Joe has an Ri oi j , bc jaid h that Bob is morc 

- - 1 ordinal scale, all hat statement that Bob 


If IQ is truly an ord ‘ nal S “ 'j intelligent than Joe. The statement that Bob 
telligent than Sam who 1 respect to IQ than are Sam and Joe would not 
and Sam are more auk* ^ £ st statement cannot be made because IQ’s 
be defensible. To say rchy . Ask the person who administered the 

are only ordinal wou ^ ^ you before he tested the children that Joe 

IQ test, and he wou Sam ant j Bob, who are much closer together. Try 

is far less intelligent tna pay no attention to the sizes of the differences 
to tell the tester ^"^g^ill tell you to mind your own business, as he should, 
between scores, an ^ nQt a com pletely equivalent unit of measurement at 

Even though an IQ um are no t on a par with a lowly ordinal scale. 

different IQ ,ev V’ _ t _ or i z ation as strictly ordinal or interval; perhaps it 
The IQ scale J “quasi-interval.” 

is better to spe ^ ^ irn p 0r tant for a researcher to attempt to categorize his 
It may o tc ” jf t h e numbers a measurer assigns to n different 

scale of measU ^ nore t han the n ranks 1,2, ... ,n (an ordinal scale), some 
objects are no numbers are meaningless in terms of the amounts of the 
statements wi t h e objects. The measurer should be forewarned that 

attribute P 0 ^ must a j s0 realize that if he has arbitrarily given “males’ ® 
this is so. (nominal measurement), the fact that 3 is greater 

2 means nothing about .he attribute measured, namely ”s=x.” « 



CHAP. 2 


MEASUREMENT, SCALES, AND STATISTICS 

manner, the distinctions between the various scales can be useful. However, 
except for a few infrequently used measures (such as time, length, and mass), 
educational and psychological measurements, especially clinical measure- 
ment, defy any easy categorization as “ordinal” or “interval. Surely, 
the author of a textbook is in no position to pass judgment on the level of 
measurement at which one is working unless he, too, is intimately involved 
in the particular problem. 

We shall not develop the notions of scales of measurement further. 
Only a few of the statistical techniques to be discussed in this text were 
developed with an eye toward the relationship of measures to the things being 
measured. The nature of this relationship is the concern of the measurement 
specialist. Statistical methods are means of analyzing numbers as numbers, 
not as true amounts of some attribute. Any statistical technique can be 
applied to any conglomeration of numbers (with some limitations, of course), 
but we know of no technique which refuses to be performed because the 
numbers put into it arc not “proper.” Statistical methods (with the possible 
exception of some psychometric scaling methods) do not add to or subtract 
from the meaningfulness of the numbers on which they are performed. This 
point was made with humor and insight by Kaplan (1964, pp. 205-6): 


Mathematics can spare us the painful necessity of doing our own 
thinking, but we must pay for the privilege by taking pains with our thinking 
both before and after mathematics comes into play. 

J recall a childhood puzzle which takes advantage of just this necessity. 
Three men registered at a hotel, paying ten dollars each for their rooms. 
The clerk, later realizing that the three rooms constituted a suite, for which 
the charge was only twenty-five dollars, gave five dollars to the bellhop to 
refund to the guests. Since five dollars is not evenly divisible by three, as 
well as for other less subtle reasons, the bellhop kept two dollars for himself 
and returned only three dollars as a refund. On his way back he calculated 
as follows. "They each paid ten dollars, making thirty dollars in all. I 
returned three dollars, or one dollar to each of them, so they each really 
paid nine. Now three times nine is twenty-seven, and two dollars I kept, 
making twenty-nine. Where is the thirtieth dollar?" Of course, ir his two 
dollars is subtracted from the twenty-seven, not added, the remainder is 
twenty-five, the amount paid the hotel. We are quite free to add tbe. numbers 
if we wish, but not to expect the sum to represent anything in the situation. 
What is missing in the bellhop's manipulations is not the dollar but good 
sense ; his logic was no better than his morals. 


2.3 

VARIABLES AND THEIR 
MEASUREMENT 


J” 1 ?'" characteristic of persons or things, e.g., weight, age, reaction 
ume, ideational fluency, reading speed, number of children, “number of 
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students. Intuition and experience 

continuous (ne “ casur ™^, S ° and reac ,ion time. We know surely 

thrUMm me variable^ are ^discrete “ ££ 

only separated values), such as number ° £ ”" unt ™' .. Number of 
discrete variables are those that are measured by counting ^ 
children” can give r.setothenum K ^ ^ as , „ 

possible for this variable to ta h if on K, had the instru- 

On the other hand - « me^e continuous variabies to 

mentation, resources, and tin, , ^ to stop mcasu ring elapsed time 

as fine a degree as we wished. seconds. Even though it is 

in a footrace after we have detenm ; 10 4 sec onds, more precise 

reported that the hundred-yard dash ™ *T “ ‘ ime , 0 bc 10.416 seconds, 
timing equipment might h2Ve '' v “ marc , correc ®o thousandths of a second. 
But even this time is not «act^ f a 4 riab i e is something that can never 
The actual or exact toe 2511 '™ always stop short of the exact value. 

be attained because measur of a vanab l e is a reported value. The 

Standing opposite the e * . . be process produced. We do not 

reported value is the value ' ofa variable to coincide, but the former 

expect the reported and actua tej if a person’s height is 62tn. 

yields bjnds for the »«er ^ ] ^ J ua , height at that time and under 
measured to the near , « 5 in. 

those conditions is be we continuous variable should always be accom- 
The measurement of any of , hc mea suring process. Races 

panied by a statement ofthe 1 a J ^ ma y bc mcasured 10 the 

are timed to the nearest mealed to the nearest day. The sens, t, city 
nearest inch; ages ml S h, b ' ™ malkst unit of the number scale which ts 
* a *•» “ ampks are ten,hs of 

r=Sd, inches, and days ^tvely. s ^ any sported value within 

We often wish to cstabl what arc thc lowest and highes 

which the exact value ies. F P of 58 in . if measurement 

actual heights that will resuh P , fa e xact value around any 

of height is to the nearest inch _ ^ subt / acnng one .l, a f the sensitmty of 

reported value are f° un ‘ l ? r J 0 ,„ d value. Thus a person with a reported 

the measuring proves! fro P bctwEsn 58 in. - (1 in./2) = 57 ’ 5 I"’ and 

height of 58 in has_ar . actual heigh Mow should clarify this pro- 

58 in. + 0 ,n -/ 2 ' 

cedure. 


h -ch iVZ.'. Tte S m”' be' 3""- one lime ,0 Ihe next. 


perfectly stable score. 
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A complicated experiment might produce data which are tabulated itt a square 
with rows and columns; and a symbol A' could need three subscripts, r,y, and 
k to specify a particular number. For example, AY 3iS (usually written 
stands for the sixth number in the cell formed by the intersection or 
row 1 and column 3. 


SIGMA (2) NOTATION 

You may wish to concentrate only on the first three-quarters of this section 
the first time through. You may safely omit the material in this section 
beyond “Rule 3” below; then you can return to studying the remaining 
fourth of the section before embarking upon Chapter IS. 

The analysis of most data involves adding, subtracting, multiplying, and 
dividing numbers, among other things. Since we want to talk about per- 
forming these operations on a group of numbers in general, we will perform 
operations on the symbols for the numbers. 

X it X 2 ,...,X n stands for a group of n numbers, any one of which can 
be referred to as X„ the /th number. X 2 4- X 2 stands for the jarrt of the first 
and second numbers. The ordering of the subscripts is usually completely 
arbitrary; X 2 -f- X t could be used to designate the sum of the first and second 
numbers, instead. X 2 + X t + X w stands for the sum of the first , second, and 
tenth numbers. 

Often we want to add up all of the numbers in a group. If there are 
five numbers in the group, n >= 5 and the sum of all the numbers is X x + Xi + 

• • ■ + X b . X 2 + X t + . . . + X„ stands for the sum of all n numbers in a 
group when the exact value of n is not specified. 

„ An abbreviation for X 2 + X 2 + . . . + X„ which is frequently used is 


2 x t means X, + X t + . . . -f- X n . 

X x < - X, + x* + X,. i X, = X, + X, + x 5 . 


£ is the Greek capital letter “sigma.” £ X, is read “the sum of X, as l 
goes from l to 5.“ J_X t is read “the sum of X t as i goes from I to n." 

Admittedly, this compact ^-notation is economical. Statisticians make 
great use of it. You will have to learn how to interpret and use 2-notation 
Dclore you learn much about statistics. The notation takes the place of a 
of directions; at first you will have to translate the symbols into a 
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set of verbal directions. As you gain greater facility with S-notation, 
however, you will respond automatically to X t without first having to 

say to yourself, “the sum of X, as / runs from 1 to 

Adding up numbers after something has been done to them, such as 
multiplying each number by 6, or squaring each number (that is, multiplying 
it by itself), is as common as simply adding the numbers as they are. Suppose 
one wants to multiply each of tt numbers by 2 and add together the resulting 
it products. The desired sum will be 

2X x + 2X a + -f 2X n . 

But surely you see that this sum is the same as 

2(X 1 + X t + ...+ X n ). 

Using the shorthand 2-notation discussed above, we can replace 
(Xx + X z + • • . + X H ) by 2 Xf The result can be summarized as follows: 

2X 1 + 2X S + ... + 2X„ = J,2X, = 2 £x,. 

This result did not come about because of any magic in the number 2; 
with 4, 60, or 131.4 the result is the same. In fact, if c stands for any constant 
number (i.e., a number that does not change regardless of what / is), then 

cXx + cX 2 + . . . + cX n = £ cX t = c2X t . (Rule 1) 

t=l i~l 

If a constant number c is to be added to each of n numbers, one writes 
X t + c,X 2 + c,...,X n + c. 

The sum of the above values is 

(X, + c) + (X, + c) + . . . + (X. + c) = |(X. + c). 

Always with addition of numbers we can regroup them in any way before 
adding. 

£ (X ( + C) = (X! + *2 + . - . + x n ) + (c + c + . . . + c). 

The first sum in parentheses on the right-hand side above is ^>X t . 

What about the second sum in parentheses? How many c’s are added to- 
gether? The answer is «. So the second sum equals nc. Consequently, 
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i(X, + £ ) = ix, + ic=SX« + nt. (R»1'2) 

i5i i-i i-i i-i 

Ifc is one constant and dis another, how else can you write 2 (cX, + <0? 
(Use Rules 1 and 2.) *“ l 

Another important expression is the sum of n numbers after each in- 
dividual number has been squared, 

(X t * X,) + {X t • XJ + . . . + (X n • X„) = x\ + Xl + . . . + Xl 
which is symbolized as 

ix?. 

Similarly, 

x?4-xj + ... + xj«J;xf, 

although in elementary statistics there will seldom be an occasion to use 
this expression. „ 

Notice that 2 X, stands for a single number: the one that results from 

«-l « n 

the addition of n numbers. 2*< might be 10, 13, or 1300. c 2 is the 
product of two numbers, c and 2 *<• ^2 ^ 2 %tj IS the product of a 

number (a certain sum) and itself. We also write it as follows: 




If X* = 3, ATj = 6, and X 3 -- 1, then 2 X, = 10 and ^ 2 x tj = 100. 

Is 2*f always the same number as ^2X^*7 '* [Hint: When does 
a 2 4- b 1 = (a + *) z ?l Calculate each when A', = 2, X t = 1 , X a — 4, and 

*4 = 1 - 

A common expression in statistical analysis is 

2 (*i + c)* = (X, + c)* + (X, + c)* + . . . + (X„ + c)\ 

(*< + c)\ which is (X, 4 - c)(X, + c), can be written in a different way: 


X, + c 
X t + c 
cX, + <* 
**« + cX, 

*« + 2cX, + c 1 ' 


\ 
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It is true, then, that i(X, + c)* = |W + 2cX, + «*). The expression 
within the parentheses may be written « 'times, as follows: 

XI + 2cX, + e* 

Xl + 2cX, + e ! 


Xl + 2eX„+e ! . 

What is the sum of the first column above? It is X; + XI + . . . + 
„ What is the sum of the second column above? It ts 2cX, + 

” a . : + 2cx„ = 2C(X, + H + . . . + “y bs wr,,tcn 

compactly as 2c ix„ What is the sum of the third column above? 

c2 «■» -*» «"» " sK ,hat 

i(X, + c)‘-|x? + 2c|x, + »c«. ( Ruls3 > 

, ” 1 ’h : n this wav, by writing each individual 

Though it is correct to proceed ^ *J sar y. Instead, one can 
expression ar, st.mm.ni ’^“"“before each term, as follows, and secure 
“distribute t’ e summation sign 

the same result more directly. 

|(X, + c) ! -| i (Xl + 2cX, + c*) 

_ ^ x, 4- 2 2c - x< + £ c * 

= £x< z + 2c £ x ‘ + " c2. 

Mote carefully how the Atofverbalizc to yourself 

nr: s Th s etm 

"1 below are symbolired by X written with two subscripts, as X„. 
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whOT | = ,.l n MtotK the posilion of the ilh observation or 

measurement (AO wilbin Ibe fib column </ - 1 , 2 J) 

Treatment 

I 2 ... J 

X n X tt ... X,J 

x iX x a ... x 2J 


x nl X M ... x nJ 

These data could be from an experiment in which n persons were given level l 
of a treatment, a different n persons were given level 2, and so on up to the 
n persons given level J. There would be nJ different persons in such an 
experiment. Alternatively, each person might be given al! J treatments. 
The two situations are quite different, as we shall sec in later chapters. 
(For the former, it is not necessary that the number of X's in each column 
be n, the same for all columns. Instead, the number could be n„ where 
J s= 1, 2, . . . , J; « 3 would then mean the number of A"s in the third column, 
n 8 the number in the second column, etc. Here we keep the n/s equal, for 
simplicity.) 

The sum of all the numbers at level 1 (i.e., column 1) is •*n + •**«+ 
... 4- X nV Notice that the first number in the subscript tells what row the 
observation is in, and the second number tells the column of the observation. 
To find the sum for column which is X u + X 21 4- - . - 4- X»„ we sum i 

from 1 to n while j keeps the value 1. We write this as 2 X n . The expression 

%X n denotes the sum over i from 1 to n while j remains 2, i.e., the sum of 

the observation in column 2. 
j 

Z -*u * s + . . . + X\J (read “the sum over j from 1 to J for 

‘ = 'V expression denotes the sum of the observations in row 1 of the 
layout. %X a is the sum of the n numbers in column j; is the sum of 

the J numbers in row /. ’ 

How could we denote the grand sum of all nJ numbers from the experi- 
ment One way would be to add up the numbers in each column individually 
and then add the J column sums together’. 

Grand Sum - j AT„ +£*„ + ...+ j x u . 
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This sum of J numbers can be denoted more simply: 

Grand Sum — £ ^X,,. 

The grand sum squared has the form 1 2 2 X i} j ■ The symbol for the 
sum of all nJ numbers which result from squaring each original observation 

J n 

and then adding together the squared numbers is 2 2 X?,- This expression 

i^i i=.i 

is read as follows; “The double summation of X sub ij squared as J runs 
from 1 to J and / runs from 1 to n." First j is given the value 1 as iruns from 
I to rt, then j is given the value 2 as i runs from 1 to n, etc. 

The sum, across columns, of the squared values of each column sum 
is denoted by 



There will be a time when we want to talk about adding one constant 
value to every number in column l of an array and a different constant to 
every number in column 2. Since the value of the constant depends only 
on the column and not on its position in the column, the subscript i is not 
needed to identify the constant. It is sufficient, then, to speak of c x and c 2 . 
We can denote a constant value for the y'th column by c f . Thus, if we wanted 
to talk about X t) plus a constant which is different for each column but the 
same for all n observations in one column, we denote this quantity by 
X,{ + c f . The set of such quantities for the jih column is 

Xu + 

X tt + Cf 


X„i + Cf. 

Thus 

% (Xu + Cf) = i Xu 4- ncf, (Rule 4) 

i=I w 

because "c f is a constant with respect to summation over For double 
summation, 

I i(x„+c / )=i( , ix„ + nc J ) =i ix„ + "5>,. 

If a constant value d were to be added to all nJ observations, then no 
subscripts for d would be needed. The value of d is the same regardless of 
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what row and column it is in. You should be convinced that 

2 i(X„ + «0 = i ix„+n/-<(. ( R ule 5) 

Because d is added to observations n times in each of J rows, it figures in 
the grand sum a total of nJ times. 

If you are not yet convinced, list the symbols for every one of the nJ 
observations or measurements, as below, and sum them all: 

/ 2 ... J 


*u -M + d . 


X i3 + d 


X M + d X Ht + d 


X nJ + d. 


Obviously, there are n cfs in each column, and there are J columns, so there 
are nJ d’s in all. 


PROBLEMS AND EXERCISES 

1. Categorize each of the following as either nominal, ordinal, interval, or ratio 
measurement: 

a. Zip-code numbers 

b. Academic rank, (assistant professor, associate professor, professor) as a 
measure of length of service 

c. Metric system of measuring distance 

d. Telephone numbers 

2. A teacher builds a spelling test by selecting a representative sample of 200 words 
from a particular dictionary. 

a. What scale of measurement is being employed if the teacher scores the test 
as follows: 

0- the student spelled at least one plural word incorrectly 

1- the student spelled all plurals correctly 

b. The teacher counts the number of conect spellings and calls this number a 
measure of general spelling ability.” What scale of measurement is being 
used (nominal, ordinal, interval, or ratio)? 

3. Determine the limits of the exact value corresponding to the reported value In 
each of the following instances: 

Sensitivity of Reported Limits of 

Var!ab!/ measurement value exact value 


?• £8? . Nearest month 6 yr 5 months 

b. weight Nearest half ounce 2 lb 1 3 0 oz 

c. Monetary value Nearest dollar $343 
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4. Let the symbol X„ denote the ith datum in the y'th group of data. 

a. Write the symbol for the 1st datum in the 2nd group. 

b. Write the symbol for the 4th datum in the 1st group. 

c. Write the symbol for the 2nd datum in an arbitrary group. 

5. Let X t ~ 2, X 2 = 7, X 3 = 1, X 4 * 3, X s = 2, and X e = 4. Evaluate each of 
the following: 

•■k- b. i-r, - c.j;jsr,_ i-tv-t* ■>- 

i~l i-2 <—l i=3 

6. Let X x « 0, X t ==* 4, X s =■ 8, .Y, = 2, and Y 5 = 1. We see that £ Y f = 15. 
Evaluate each of the Following without operating on the original five numbers, 
a. 2 4X \ - (Rule I) b. 2 (-T, + 3.1) - (Rule 2) 

t-1 «-l 

c.iw-2)- (i*J - 

7. Consider the following data: 

^n-4 *„ = 3 

*n - 2 X 22 ~ 2 

*31 = 6 *, 2 - 1 

*« = 3 Y w = 5 

a. Hx«~ b.2^- 

i -\ ,-l 

c. %X 3l *= d. - 

g. Write out the following expressions: 

\?5“ \t*- c (,t jr ‘J" 

9. Change the following expressions into sigma-notation: 

a. 3X, + 33T, + 33f, - b. (*i + . . . + .T,,)* - 
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3.1 

TABULATION OF DATA 

Before quantitative data can be understood and interpreted, it is usually 
necessary to summarize them. Table 3.1 shows a class record for a reading 
readiness test administered at the beginning of the school year. The scores 
appear in alphabetical order as they are recorded in the teacher’s class roll 
book. However, the scores do not mean very much in this form, and we 
can tell Only with some difficulty whether, for example, the first-listed pupil 
(David A), with a score of 90 points out of a possible 128, is superior or just 
average in reading readiness, compared with his classmates. 


Rank Order 

Ordinarily the first step is to arrange the scores in order of size, usually 
from highest to lowest. This is called an ungrouped series. In a small class, 
this is often all that is necessary. Table 3.2 shows the same 38 scores as Table 


lb 
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3. 1 , arranged in order of size from 1 12 to 44. This table also shows the rank 
order of the pupils (1st, 2nd, . . . , 38th) and the scores tabulated without 
further grouping. It is now easy to see that David A’s score of 90 gives him 
a rank of 13 in a class of 38, or about one-third of the way from the top. 
Similarly, it is easy to interpret each of the other scores in terms of rank. 
But ties are likely to occur, especially in classes of 20 or more pupils. Notice, 
for example, that two pupils made a score of 97. Since it is not correct to 
say that one ranks higher than the other, we must assign them the same rank. 
Since there are six pupils who rank higher (I, 2, 3, 4, 5, 6), the next two 
ranks, 7 and 8, are averaged, giving 7.5. In like manner the average of 
ranks 9 and 10 is 9.5, and so on for the other pupils with tied scores. There 
are three pupils with scores of 75, and there are 21 pupils who rank above 
this score; the average of the next three ranks (22, 23, and 24) is 23, which is 
the rank assigned to each of the scores of 75. In addition to the time and 
trouble required to determine these ranks, the list is long, unwieldy, and 
inadequate for making comparisons with other classes that are much larger 
or much smaller; ranking I9th in a class of 38 pupils is poorer than ranking 
19th in an equally capable class of 70 pupils. 

TABLE 3.1 A CLASS RECORD FOR A READING READINESS TEST (38 PUPILS) 


Pupil 

Score 

Pupil 

Score 

Pupil 

Score 

Pupil 

Score 

David A. 

90 

Robert D. 

59 

Jerome L. 

75 

Paul S. 

81 

Barbara B. 

66 

Dan F. 

95 

Rosa M. 

75 

Richard S. 

71 

Charles B. 

106 

Larry F. 

78 

Billy N. 

51 

Robert S. 

68 

Robert B. 

84 

Richard G. 

70 

Nancy O. 

109 

William S. 

112 

Mildred C. 

105 

Grover H. 

47 

Carrie P. 

89 

Jean T. 

62 

Robbin C. 

83 

Robert H. 

95 

Ralph R. 

58 

Adolfo W. 

91 

Robert C. 

104 

Sylvia H. 

100 

George S. 

59 

Dolores W. 

93 

Diney D. 

82 

Warren H. 

69 

Gretta S. 

72 

Richard W. 

84 

Jim D. 

97 

Clarence K. 

44 

Jack S. 

74 



John D. 

97 

David K. 

80 

M ary S. 

75 




The Frequency Distribution 

The list of scores can be made shorter by arranging the scores in a frequency 
distribution, sometimes simply called a distribution. The third and fourth 
columns of Table 3.2 show the simplest form of a distribution. The various 
scores are arranged in order of size, here from 112 to 44, and to the right of 
each score is recorded the number of times it occurs. Each entry to the right 
of a score is called a frequency, abbreviated /, and the total of the frequencies 
is represented by n. 
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TABLE 3.2 READING READINESS SCORES FROM TABLE 3.1 ARRANGED IN ORDER OF SIZE, 
RANKED, AND TABULATED 


Tabulated without further grouping 
Order of size Rank Score Frequency (/) 



* ar 8 e number of scores— sav irv\ 

carry the summarization of datHne »* d «irabl< 

w.de rang, of scor „ , ha| ft As a rule, there is m 

uah as a g, m iDdlJdb *.° E™»P them according to . 

'° " 4 ' - » o, Si' rou p i calS T’ ****■ 

6 p is called a score class. The c 
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plete grouping arrangement is usually referred to as a grouped frequency 
distribution. Although there is no fixed rule for the number of score classes, 
it is usually best to make not fewer than 12 classes nor more than about 15. 
To have fewer than 12 classes is to run the risk of distorting the results, 
whereas more than 15 classes produce a table that is inconvenient to handle. 


Constructing the Grouped 
Frequency Distribution 

There are four steps in making the ordinary grouped frequency distribution. 

These are shown in Table 3.3, using the scores given in Table 3.1. 

1. Determine the inclusive range, which is 1 plus the difference between 
the highest score and the lowest. Of these scores, the highest is 1 12 
and the lowest is 44, which gives a range of (112 — 44) -f- l = 69. 
Actually, 112 is considered to cover the one-point score interval 
112.5-111.5, and 44 the interval 44.5-43.5. Notice, therefore, that 
the range is 69 [(112 — 44) + 1, or 112.5 — 43.5]. The real score 
limits are not always fractional, however. If age is reckoned at the 
last (most recent) birthday, then persons who report themselves as 
being 44 years old (that is, not yet 45) lie within the interval 
44.00-44.99 . . . (almost, but not quite 45.00), whose midpoint is 

44.5. If they report age to nearest birthday, the interval is 43.5- 

44.5, with a midpoint of 44. Similarly, if they report themselves 
“going on 44,” the interval is 43.00-43.99 . . . , with midpoint 43.5. 
There will be a difference of almost two years between the youngest 
possible “going on 44” person, who has just reached the age of 43, 
and the oldest possible “44 last birthday” respondent, who is almost 

45. When we ask merely for “Age ,” without specifying 

the reckoning system, we will not be able to interpret our results 
precisely. 

2. Select the score-class grouping interval, which is the width of the 
groups into which the scores are to be classified, so that there will 
be not fewer than 12 score classes nor more than 15. To do this, 
divide the range by 12 to find the largest group, or score-class 
interval to be used. Divide the range by 15 to find the smallest 
class interval to be used. In this case, 69 -f- 12 = 5.75, and 
69 -i- 15 = 4.60. Since it is impractical to use any class interval 
except a whole number, the larger number, 5.75, is “rounded down” 
to 5 and the 4.60 “rounded up” to 5, even though a class interval 
of 6 would yield 12 score classes for these 38 scores. Odd-numbered 
interval widths such as 5, which have whole-number midpoints 
when the score-class limits are fractional (end in .5), are usually 
preferred to even-numbered interval widths, which have fractional 
midpoints when the class limits are fractional. The midpoint of the 
score class 110-114, which contains the 5 scores 110, 111, 112, 113, 
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and 1 14, is 1 12 (that is, 110 + ((1 14 - 1 10) ! 2] l 10 + ( 4 ^ f > 

HO 4 - 2 = 112). (Another way to determine the midpoint ot an 
interval is simply to average the reported limits of the interval: 
mo 4* 114) ~2 = 112.) If a class size of 6 were used, with score 
limits of 108-113, for example, the midpoint of this even-numbered 
croup would be U0.5, which might result in more complex com- 
putations. Hence, a class interval of 5 is preferable to 6 when the 
class limits are fractional. 

3. Determine the limits of the classes. There must, of course, be 
* enough classes to include the highest score and the lowest score. 
To facilitate tabulation, start each class with a multiple of the class 
interval. If the lowest class starts with 40, which is a multiple of 5, 
it will accommodate the lowest score, 44, whereas a class beginning 
with 45 will not. Each succeeding whole-number class lower limit 
will be 5 points above the one just below it. The next class will 
start at 45, the next at 50, and so on, until the highest score, 1 12, is 
included in the class 110-114. 


4. Make the tabulation. A tally is made for each score opposite the 
class in which it falls. To make the tabulation it is not necessary 
to have the scores arranged in order, for this process may require 
more time than the tabulation itself. In the original alphabetical 
list, the first score is 90. In the tabulation column opposite the 
class which begins with 90, a tally line is drawn to indicate the score. 
The next score is 66. This falls in the class which begins at 65, so a 
tally is made there. In the same way , a tally is placed in the column 
opposite the appropriate class for each of the other scores. 


In the finished table, the steps by which it was made do not appear. 
Only two columns occur in the simplest form of a frequency distribution. 
The first shows the various classes, usually arranged in descending order from 
top to bottom, and the second shows the frequencies — the number of scores 
in each class. 


To be sure that you understand Steps 3 and 4, above, stop at this point 
and construct a grouped frequency distribution of the 38 scores, using a class 
(grouping) interval of 6. Does the number of classes that result meet the 
12-15 criterion suggested in Step 2, above? 

When two or more groups ot data are to be compared, it is usually 
best to include all the data in the same table. In that case there will be a 
column for the classes into which the seotes ate grouped and one for each 
of the schools or grades being compared. Table 3.4 shows a frequency 
table which combines the record of si* schools on a certain text. The number 
0 grouping intervals varies from 9 for School F to 17 fo, Schools A and D, 
although some intervals have no tallies. 
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TABLE 3.3 AN ILLUSTRATION OF THE PROCESS OF MAKING A GROUPED FREQUENCY 
DISTRIBUTION 
Original scores 

( front Table 3.1) Steps in making (he distribution 


90 

66 

106 

84 

105 

83 

104 

82 

97 

97 

59 

95 

78 

70 

47 

95 

100 

69 


Step I. Determining the range. 

Highest score 112 

Lowest score 44 

Range = Difference + 1 = 68 + 1 = 69 


Step 2. Selecting the class interval. 

69 -I- 12 = 5.75, largest class interval desirable. 
Round down to 5. 

69 -r 15 4.60, smallest class interval desirable. 

Round up to 5. 


Steps 3 and 4. Determining the limits of the classes and making 
the tabulation. 


Whole-number limits 


75 

75 

51 

109 

89 

58 

59 
72 

74 

75 
81 


112 

62 

91 

93 

84 


oj the 15 dosses 


110-114 

105-109 

100-104 

95-99 

90-94 

85-89 

80-84 

75-79 

70-74 

65-69 

60-64 

55-59 

50-54 

45-49 

40-44 


Tally 


I 

III 

II 
mi 
in 

i 

mi 

mi 

mi 

/// 

/ 

in 

i 

i 

i 


Frequency (f) 

1 

3 
2 

4 

3 
1 
6 

4 
4 
3 
J 
3 


n = 38 


The Form or the Table 

A few words should be said about the pure mechanical makeup of the table 
as it often occurs in typed or printed form; Table 3.4 and the other tables 
in this book illustrate one format for the printed table. Each table bears an 
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TABLE , A DISTRIBUTION OE READ.NC READINESS SCORES EOR EACH OE SIX SCHOOLS 
TA IN A CERTAIN ClTf 

School All six 

— — — ‘ schools 


Score 


1KM24 

US-119 

110-114 

1QS-109 

lOO-UM 3 

95-99 6 

90-94 5 2 

85-89 4 4 

80-84 2 3 

15-79 10 5 

70-74 6 2 

65-69 9 4 

60-64 4 5 

55-59 I 

50-54 1 

45-49 I 

40-44 

35-39 1 1 

30-34 2 

25-29 1 

20-24 
15-19 

10-14 1 


1 



7 

15 

23 

31 

18 

29 

26 

29 

21 

13 

5 

2 

2 

5 

2 

2 

1 


X 45 38 38 37 40 36 234 


identifying number. Although either Roman or Arabic numerals may be 
used . Arabic numerals seem to be increasingly favored. The table number 
may be centered above the table title, or it may be given 3t the beginning 
of ihe title. The table often starts with a single or double horizontal line 
and usually ends with a single horizontal line. Another horizontal line 
separates the column headings from the table body, and other horizontal 
lines separate any summarizing measures that are given under the table 
proper. Vertical lines may be used to separate the columns, but usually no 
lines are drawn along the margins of the page. It is considered good form 
to avoid abbreviations in the table whenever possible, and to make the title 
and headings complete enough to indicate the contents of the table clearly. 

IVallis and Roberts (1965, Chap. 9) present an excellent chapter on the 
art of reading statistical tables. They remind the reader of the obvious 
precautions, e.g., read the title and headnote carefully, and they demonstrate 
some subtle and sophisticated techniques for extracting hidden information 
in tables. Wa5h» and Roberts has c manaced to do all this and be entertaining 
as well. 
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One of the most efficient and useful methods of describing a group of obser- 
vations is by means of quantiles. A quantile is a concept and percentiles , 
deciles , and quartiles are three examples of it. A quantile is a point on a 
number scale which is assumed to underlie a set of observations; the quantile 
divides the set of observations into two groups with known proportions in 
each group. For example, there are three quartiles (Q t , Q t , and Q 3 ); they 
divide a group of observations into four quarters. Q t is that point on the 
number scale such that one-fourth the observations lie below it; one-half the 
observations lie below Q 2 , and three-fourths of the observations lie below 
Q 3 . Thus the three quartiles divide a set of observations into four portions 
that are equal in terms of the proportion of observations in each portion. 
The 99 possible percentiles (P x> .... P 99 ) divide a set of observations into 
100 portions, each of which contains an equal number of observations. The 
nine deciles (D 1% .... D 9 ) divide a set of observations into ten equal portions. 

If 25% of the observations are always below P a , the 25th percentile, and 
the same is true for Q x , the first quartile, then P& must equal Q x . Figure 3.1 



FIG. 3.1 Relationships among quantiles. 

shows the relationships among the various quantiles defined so far and still 
another type called quintiles. (Quint for Jive.') The four quintiles divide a 
group into fifths. We shall denote them K x , K t , K 3 , and K 4 since the letter Q 
was used for quartiles. 

Quantiles are very useful for summarizing data. Simply reporting that 
P s is 10.75 and P xs is 16.80 tells immediately that 5% of the observations 
are less than 10.75 and that 10% of them lie between 10.75 and 16.80. For 
some large groups of data that are familiar, the entire collection of obser- 
vations can be pictured in a reader’s mind if he knows only the values of 
three or four percentiles, for example. All too often, more complicated 
summary measures (such as those to be encountered soon) are used to 
describe data when certain quantiles are more easily calculated and more 
readily understood. It is regrettable that quantiles are not more widely 
used by persons attempting simply to describe a set of data. 
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determining percentiles 

Because of the relationships among various quantiles noted in Fig. 3.1, one 
need know only how to determine percentiles to find the values of any 
quantiles desired. (Typically, no one ever wishes to divide a group of 
observations into more than 100 quantiles.) 

The definition of a percentile is simple: the Pth percentile is the point 
below which P percent of the scores lie. Calculating a given percentile is 
slightly more complicated than the definition might lead you to believe. 

Before beginning the calculation of any percentile in a group of scores 
one must arrange the scores from smallest to largest. This may be a time- 
consuming operation in large groups of scores, and it may be more con- 
venient to tabulate the scores in a grouped frequency distribution. The method 
we will present for finding a percentile point is general and applies to either 
a ranking or a grouped frequency distribution of the scores. 

A teacher administered a 40-item achievement test to his 125 students. 
The test score was taken to be the number of questions answered correctly. 
An ungrouped frequency distribution of the 125 test scores appears in Table 
3.5. 

What is the 25th percentile in the group of 125 test scores, i.e., what is 
the value of P K ? is the point below which 25% of the 125 scores fie. 

The calculation of any percentile will be facilitated if a cumulative 
frequency distribution is constructed. The cumulative frequency up to any 
given score is the total number of frequencies at or below that score. In the 
third column in Table 3.5 you will find the cumulative frequencies for the 
125 achievement test scores. Notice, for example, that there arc 106 persons 
w’nh test scores of 33 or less. The cumulative frequency through the test 
score of 33 is 106. 

The calculation of P u can be accomplished in five steps: 

Step t» Find (,25)n by dividing n by 4: 


Step 2. Determine the lower real limit L o{ the score class 
containing the 31.25th person from the bottom. 

Because 16 persons scored at or below a score of 28, and 34 perso: 

° a9 ' lh ' 3L2!lh °" <I« «*>«£ 

, •* “'■* 1* frtqowcta a .core or 29 are ever 

>pread along .ho .core class 28.5-29.5. Each frequency occupies (1(18) 
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TABLE 3.5 DETERMINATION OF P„, THE 25th PERCENTILE. IN A FREQUENCY DISTRIBUTION 
WHERE THE SCORE-CLASS INTERVAL JS ONE 


Test 

score 

Frequency 

Cumulative 

frequency 

Calculations 

38 

1 

125 


37 

1 

124 

Step I. 0.25n = - = -— = 31.25. 

36 

3 

123 

Step 2. Find lower real limit of score class 

35 

5 

120 

containing 31.25th score: 



115 

L = 28.5. 

33 

8 

106 


32 

17 

98 

Step 3. Subtract the cumulative frequency up to 

31 

23 

81 

L from 31.25: 

30 

24 

58 

31.25 = 16 « 15.25. 

29 

18 

34 


28 

10 

16 

Step 4. Divide the result of Step 3 by the 

27 

3 


frequency /in the interval containing 

26 

1 

3 

the 31.25th score: 

25 

0 

2 

15.25 n 

24 

2 

n « 125 

2 

— -0.85. 

Step 5. Add the result of Step 4 to L\ 

P %i = 28.5 + 0.85 « 29.35. 


of a unit. Determining how much of the interval the 31.25th score cuts off 
is a problem of interpolation within the interval. Steps 3 and 4 accomplish 
the interpolation. 

Step 3. Subtract the cumulative frequency (cum./) up to L 
from .25a. L is 28.5, and 16 frequencies have ac- 
cumulated up to L , Hence, 

,25« - (cum./) - 31.25 - 16 15.25. 

In Step 3, one determines how many frequencies in the interval 28.5-29.5 
lie below .2 5«. 

Step 4. Divide the result of Step 3 by the frequency / in the 
interval containing the .25nth frequency. 


Step 4 is the determination of what fraction of the score-class interval 
lies below the .25nth frequency. There are 18 frequencies in the interval 
28.5-29.5, and 15.25/18 = 0.85 of the interval is occupied by the first 15£ 
frequencies. 
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Step 5- M4 ttic TttPtt of SXty 4 to L. The sum is /*«. 

P , , = 28.5 + 0.85 = 29.35. 


Under the conventions vre have adopted tor the presentation of the 
scores, P a = 29.35; i.e„ 25% of the 125 scores lie below 29.35. (Similarly, 
75% of the 125 scores lie above 29.35.) 

Steps 1 through 5 can be expressed in a single formula: 

Pa=L+ ^±anJ2, (3.i) 


where L is the lower real limit of the score interval of length 1 containing the 
.25nth frequency from the bottom of the distribution, 
cum./ is the cumulative frequency up to and 
/ is the frequency within the score interval containing the - 25 nth 
frequency. 

A more general form of Eq. (3.1) applies to the determination of any 
percentile in a frequency distribution whose score-class interval is 1 . Suppose 
we wish to find the point which exceeds some proportion p of the frequencies. 
P t represents the pth percentile. 

P i = i + E^lipD, (3.2) 

where L is the lower real limit of the score interval containing the pniti 
frequency, 

cum. fa the cumulative frequency up to Z., and 

/is frequency of scores in the ipterval containing the pnlh score. 


We shall illustrate the use of Eq. (3.2) by calculating P K from the data 
in Table 3.5: 


P w = 30.5 -~ 


75-58 

23 


31.24. 


Calculation of any percentile point from a grouped frequency distribution 
is almost identical to the calculations for the ungrouped distribution. In 
fact, the formula for the grouped frequency distribution that will be developed 
includes Eq. (3.2) as a special case when the score interval is one unit wide. 

The data in Table 3.6 are the ages to the nearest year of 1982 teachers 
who participated in special summer programs for the improvement of the 
teaching of selected high-school subjects. 

The general formula for determining the pth percentile in a group of 
n scores is as follows: ^ v 


r. 


= L + ? n ~ ( p m £ (tt % 


(3i) 
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TABLE 3.6 


where L is the lower real limit of the interval containing the />«th frequency 
from the bottom, 

cum. f is the cumulative frequency up to L, 

f is the frequency in the interval containing the pnth frequency, and 
W is the width of any score interval. 


Note that Eq. (3.3) is exactly the same as Eq. (3.2) when W ■= 1, i.e., when 
the score interval is one score wide, which means that scores are not grouped 
more finely than originally in constructing the frequency distribution. 

Application of Eq. (3.3) will now be illustrated by finding P 20 for the 
data in Table 3.6. The /with score — the 396th score — lies in the interval 
24-27 that has a lower real limit of 23.5. The difference between pn and the 
cumulative frequency up to 23.5 is 396.4— 135. Noting that the frequency 
in the interval including the 396th score is 295 and the interval width is 
4 units, we obtain: 


P s o = 23.5 + 


396.4 - 135 
295 


• 4 = 27.04. 


Under the conventions adopted for the calculation of percentiles, we 
can say that 20% of the teachers were younger than 27.04 years. We do not 
expect this statement to be exactly true. Errors have entered the process 
of determining the percentiles at two points. First, ages were reported to the 


ILLUSTRATION OF THE CALCULATION OF P,„ IN A GROUPED FREQUENCY 
DISTRIBUTION 

Age Cumulative 

interval Frequency frequency Calculations 


64-67 

60-63 

56-59 

52-55 

48-51 

44-47 

40-43 

36-39 

32-35 

28-31 

24-27 

20-23 


4 1982 

38 1978 

82 1940 

120 1858 

125 1738 

160 1613 

221 1453 

204 1232 

307 1028 

291 721 

295 430 

135 135 


Step 1. 0.50 n = 0.50(1982) = 991. 

Step 2. Find lower real limit of score class 
containing 991st score; 

L = 31.50. 

Step 3. Subtract the cumulative frequency 
cum. f up to L from 991 : 

991 - 7 21 ■= 270. 


Step 4. Divide the result of Step 3 by the 

frequency /in the interval containing 
the 991st score: 


Step 5. Multiply the result of Step 4 by the 
width \V of the score class: 
(0.88)(4) = 3.52. 

Step 6. Add the result of Step 5 to L : 

/>,. = 31.50 + 3,52 = 35.02. 
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nearest sear instead or to the nearest month, day. hour, or minute. Second, 
it ssas assumed that the frequencies within each score interval were evenly 
distributed over the entire interval. Undoubtedly this assumption was false: 
at the younger ages frequencies are probably stacked up at the upper end of 
each score interval; at the older ages, frequencies are probably stacked up 
at the lower end of each interval. In making the assumption of evenly 
distributed frequencies, a compromise between computational labor and 
errors of approximation W3S made. The error made in approximating P t o 
or any other percentile is probably immaterial relative to the difficulties that 
would arise if we assumed that the frequencies were distributed in any manner 
other than evenly on each score interval. 

That percentiles are fractional (e.g., 27.04) in measuring age is reasonable. 
We have no difficulty imagining that someone is precisely 27.04 years of age. 
What if the variable being measured is discrete? Suppose we build a fre- 
quency distribution of sizes of kindergarten classes in a large city school 
system. This variable, “size of class,” can take only values such as 25, 26, 
27, 28, etc. It is absurd to speak of a class with 27.31 students. Yet if we 
were to build a frequency distribution of class sizes and calculate percentile 
points by the methods in this chapter, we would most certainly obtain 
fractional values. Can such fractional percentile points be taken seriously? 
Isn't it utterly false that 81% of the classes can have a size of 32.41 while 
89% have a size of 32.78? Of course, it is utterly false. The same percent 
of the classes hase 32.41 or fewer students and 32.78 or fewer students, and 
this percent is exactly the number of classes with 32 or fewer students. Even 
though fractional percentiles in distributions of discrete variables cannot be 
reconciled with common sense, they are useful and widely employed. The 
alternative to using them is to abandon this convenient and helpful process 
of converting scores to percentiles and adopt some more cumbersome 
procedure. No one seems willing to do this just because a class with 32.50 
students seems a little ridiculous. 

3.4 

GRAPHING DATA 

There can be little doubt that the graphical representation of educational 
data is a valuable supplement to statistical analysis and summarization. 
A graph or chart tends to attract the reader’s attention. The average casual 
reader is likely to give scant attention to the ordinary primed matter in a 
ttscatch report and to be unimpressed by the mass of tabular data often 
pled up at the end. However, his eje is likely to be arrested by any picture 
or chart that may happen to be included, and this may lead him to read 
the entire discussion. 

A graph is often an effective method of clarifying a point. One small 
graph will often nuke a point more dearly than a dozen tables or paragraphs. 
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It is sometimes said that the facts speak for themselves. In reality, statistics 
often stand speechless and silent, tables arc sometimes tongue-tied, and only 
the graph cries aloud its message. Ordinary numerical data arc quite 
abstract; they convey their meaning vaguely and with effort to the average 
mind. The picture or graph is a more concrete representation. 

A wide variety of graphs and charts arc shown in Fig. 3.2(a) and (b). 
In Fig. 3.2(b) basic information concerning motor buses in operation in the 
United States is given first in tabular form, followed by 15 different black 
manner 8faphS ‘ Each Eraph convc y s a unique point in an impressive 

thii i Sctr UOn ° f lh ' r “ nc,ion ° r Era|,hs is a fitli "S conclusion lo 


not r h '" Vi ’? al education in oil aspects has become, 
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arc clear from ilsconttM?® 11 > and simply, Its purposes, which follow, 

’• JSS? Com P reh “ si »" »' “a,. Ihan I, poeeible wi.h texloal mailer 

t A check of accuracy 31 ^'* * lba " I*”’"* in written tent 

planning and familiarity with the functions oT n arr ’ ei ^ ° ul lhr cu£h careful 
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presentation that will describe stfliktir.fi '' 0 ,hc development of graphic 
impact: nbc sta,ls,lcal da ‘ a with clarity and dramatic 
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bution of scores graphically: the histogram or column diagram, the frequency 
polygon, and the smooth curve. 

The Histogram or Column 
Diagram 

The histogram is a series of columns, each having as its base one class interval 
and as its height the number of cases, or frequency, in that class. Figure 3.3 



FIG. 3.3 A histogram, or column diagram, representing the percentage 
values assigned to an arithmetic paper by 42 scorers. 

represents a histogram showing the distribution of percentage values assigned 
to an arithmetic paper by 42 scorers. Since the greatest frequency is 9, in 
the 59.5-64.5 class, it is not necessary to extend the vertical or frequency 
scale at the left above 9. And since the scores range from the 29.5-34.5 
class to the 74.5-79 .5 class, it is necessary to represent the horizontal scale 
only through that distance. For clarity, however, it is customary to extend 



FIG 3 4 A histogram, or column diagram, representing the distribution 
of the 83 IQ's in a small junior high school. 
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the scale one class interval above and below that range. In order to avoid 
having the figure be too flat or loo steep, it is usually well to arrange the 
scales so that the width of the histogram itself is about one and two-thirds 
times its height— that is, the ratio of height to width should be approximately 
3:5. A column is centered around the midpoint of the score-class interval. 
In actual practice it is customary to represent the histogram in outline form, 
rather than to show the full length of the columns. Figure 3.4 illustrates 
the shaded outline form of the histogram. 

The Frequency Polygon 

The process of constructing a frequency polygon is very much like that 
of constructing the histogram. In the histogram, the top of each column is 
indicated by a horizontal line, the length of one class interval, placed at the 
proper height to represent the frequency in that class. But in the polygon 
a point is located above the midpoint of each class interval and at the proper 
height to represent the frequency in that class. These points arc then joined 
by straight lines. As the frequency is zero at the classes above and below 
those in the distribution, the polygon is completed by connecting the points 
that represent the highest and lowest classes with the base line at the midpoints 
of the class intervals next above and below. Figure 3.5 shows a polygon 
for the same data represented by a histogram in Fig. 3.4. 
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A smooth curve widely used in representing test scores is the percentile 
curve or ogive. Figure 3.6 shows a percentile curve used to represent the 
percentage data already used to illustrate the histogram and the polygon. 
The points that determine the percentiie curve are located on the horizontal 
line at the upper limit of each class, at the position that indicates on the 
horizontal scale the percentage of scores up to and including that class. 
Notice, also, that two columns have been added to the ordinary frequency 
table. The cumulative frequency column indicates the number of scores up 
to and including each class. For example, there is one score in the 30-34 
class, and there are two in the 35-39 class, making a cumulative frequency 
of 3 in the two lowest classes. The cumulative percent column shows what 
percentage each of these cumulative frequencies is of the total. In the 



FIG. 3.6 A percentile curve representing the percentage values assigned 
to an arithmetic paper by 42 scorers. 


illustration, the total, n, is 42. The first entry in this column is, of course, 
100; the second is 98, because 41 is 98% of 42; the third is 95, because 40 
is 95 % of 42 ; and so on for the others. Each value in the cumulative percent 
column is represented as a point on the upper limit of that class interval 
(the horizontal line separating that class from the class above it), since it 
includes the percentage of scores up through that class. These points 
determine the curve. As a rule, especially in small groups where irregularities 
are most likely to occur, it is best to miss some of the points in order to 
obtain a smooth and regular curve; but care should be taken in order to 
leave about as many points on one side of the line as on the other. In this 
way the ogive will fit the trend of the points as closely as possible. 
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12 i Tally 


145-149 1 

140-144 2 

135-139 2 

130-134 5 

125-129 8 

120-124 5 

115-119 16 

110-114 12 

105-109 10 

100-104 8 

95-99 6 

90-94 8 

85-89 6 

80-84 1 

75-79 _l 

91 


X 

XX 

XX 

xxxxx 

xxxxxxxx 

xxxxx 

xxxxxxxxxxxxxxxx 

xxxxxxxxxxxx 

xxxxxxxxxx 

xxxxxxxx 

xxxxxx 

xxxxxxxx 

xxxxxx 

X 


FIG. 3.7 Bar graph made on a type- 
writer, showing the distribution of 91 
IQ’s in a junior high school. 
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Typewriter Graphs 

A satisfactory bar graph can be made on the typewriter. Figures 3.7, 3.8, 
and 3.9 illustrate this type of graph. Other graphs, such as the circle or pie 
graph, and various picture graphs, or pictographs, are occasionally used 
in educational measurement; these are discussed especially by Spear (1952). 


Score 

Frequency 
for Grade 

Bar Graph for School 

Grade 

7 

8 

9 

Seventh 

Eighth 

Ninth 

200-219 



3 



999 

100-199 

1 

4 

5 

7 

8868 

99999 

160-179 

3 

3 

7 

777 

888 

9999999 

140-159 

4 

9 

7 

7777 

888888888 

9999999 

120-139 

11 

7 

11 

77777777777 

8888888 

99999999999 

100-119 

4 

7 

? 

7777 

8888808 

99 

00- 99 

D 


1 

7777 

88 

9 

60- 79 

ill 

a 


7 

868 


4o- 59 


H 



8 


20- 39 





8 



FIG. 3 9 Graph made on a typewriter, showing the overlapping of grades 
seven, eight, and nine in reading comprehension. 


Which Graph is Best? 

As we might expect, no one type of graph is equally good for all purposes. 
The histogram is the easiest of all to understand and therefore is usually 
best if no more than one distribution is being Represented. But if two or 
more distributions are to be compared, frequency polygons (or relative 
frequency polygons) are usually better, since so many lines coincide when 
histograms are superimposed that the picture is likely to be confusing. The 
percentile curves have many advantages not possessed by other curves. An 
important one is that from them it is possible to estimate with a high degree 
of accuracy the quartiles, medians, and other similar points. As we will 
see in the next section, by means of percentile curves several groups can be 
presented, for convenient comparison, on a single graph. The main value 
of bar graphs, circle graphs, and picture graphs lies probably in school 
publicity and in the motivation of learning. “A successful graph,” as the 
prominent educator Douglas Scates pointed out long ago, “depends far 
more on careful thought and judgment than on techniques" (Scates, 1942). 
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Representing Two or More 
Distributions Graphically 


It is ofttn desirable to compare two or more distributions. For 
school administrators may wish to compare the entire distribution of verbal 
ability lest scores of the pupils in one school with that of pupils in another 
school. Also, the overlapping or scores among the various grades within 
a single building is a striking way to present the need for individualized 
instruction and varied materials within the same grade. 


Representing Entire Distributions 

When it is important to compare two ot more entire distributions, as would 
be the case in a study of the status of a school or school system, the choice 
will usually be between the frequency polygon and the percentile curve. We 
have already pointed out the difficulty of superimposing two or more histo- 
grams. A series of polygons may be drawn on the same sheet one above the 
other, or alongside each other. In Fig. 3.9, a method of showing overlapping 
by using bar graphs made on the typewriter is illustrated. (Perhaps the 
scores there are grouped too coarsely. According to the conventional rule 
it would be better to have 12 to 15 score classes, instead of the 7, 9, and 7 
actually used in Fig. 3.9.) 

The Use of Polygons 

The distinct advantage of polygons over histograms for representing a series 
of distributions is that polygons can be superimposed on each other with less 
crossing of lines. In this form, comparisons among distributions are more 
easily made. Figure 3.10 illustrates this possibility with the distribution of 
reading comprehension scores for the 100 seventh, 100 eighth, and 100 ninth 
grades of a certain school. (For some purposes it could be quite misleading 
to graph overlapping frequency polygons for different sizes of groups. In 
these instances, frequencies should be converted to relative frequencies— 
proportions before graphing.) The great overlapping in reading ability of 
the three grades stands out clearly. But even with only three distributions, 
the lines cross and rccross so many times that it becomes difficult to make any 
accurate comparison of one grade with another. More than three classes 
can hardly be represented in the same graph by frequency polygons without 
considerable confusion. c 


The Use of Percentile Curves 


l or the graphic comparison of two 
curve has some distinct adsantages. 


or more distributions the percentile 
Since the frequencies are reduced to 



33 92 
20 78 
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comprehension scores, for the same grades as in Figs. 3.9 and 3.10, in the 
form of percentile curves. 

From these percentile curves we can observe several relationships that 
were not apparent in the polygons. It is clear that although the seventh 
and eighth grades have almost exactly the same average scores, the eighth 
grade has greater variability. This is evident since the upper half of the eighth 
grade exceeds the upper half of the seventh grade, while the lower half of 
the eighth grade falls behind the lower half or the seventh. 

Furthermore, although the ninth grade runs consistently above the other 
two grades, about 15 percent of the ninth-grade pupils fall below the median 
ot the seventh and eighth grades. 


3.6 
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9. Itis advisable not to show any more coordinate lines than necessary 
to guide the eje in reading the diagram. 

10. The curve tines of a diagram should be sharply distinguished from 
the ruling. 

11. In curves representing a scries of observations, it is advisable, 
whenever possible, to indicate dearly on the diagram all the curves 
representing the separate observations. 

12. The horizontal scale for curves should usually read from left to 
right and the vertical scale from bottom to top. 

13. Figures for the scales of a diagram should be placed at the left and 
at the bottom or along the respective axes. 

14. It is often desirable to include in the diagram the numerical data 
or formulae represented. 

1S ' llfif'lif da “ “'1"°* il,cl “ d ' :d i» the diagram, it is desirable 
to give the data in tabular form accompanying Ihe diagram. 

■ beeasil'v^a ? d Iff S ° n “ dia S ram Shou,d bc P la “ d s ° as “> 
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a. Determine the inclusive range of the above set of scores. 

b. Divide the inclusive range [found in (a)J by 12 and by 15, separately. 

c. Choose the class size to be the whole number which lies between the two 
quotients found in (b) above. 

d. Using the class size found in (c) above, construct a grouped frequency for 
the 75 scores: 

Interval Score interval Tally Frequency Cumulative freq. 


14 145-149 

13 
12 
11 
10 
9 
8 
7 
6 
5 
4 
3 
2 

1 80-84 


75 


2. Find the 50th percentile, P 50 , for the 75 intelligence-test scores from the grouped 
frequency distribution constructed in Prob. 1. 

3. Complete the frequency polygon started below for the grouped frequency dis- 
tribution in Prob. 1. Be sure to label the horizontal axis with the midpoints 
of the 14 score intervals. 


!5r 


10 


5 




Intelligence test score 
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4. I’lol (mo re talk e frequency po!)gon» on (he ume graph from (he following 
grouped frequency distribution* of Scholastic Aptitude Tot Verbal wore* for 
the 903 men and the 547 u omen in the freshman claw of a Urge Laitcrrt university : 


Men Women 


score 

intend 

I'rcquenc) 

Maine 

frequency 

frequency 

Maine 

frequency 

750-800 

J 

.001 

4 

.007 

700-749 

27 

.030 

28 

.051 

650-699 

63 

.070 

56 

.102 

600-649 

138 

.153 

85 

.155 

550-599 

174 

.193 

117 

.214 

500-549 

202 

.226 

128 

.234 

450-199 

171 

.189 

86 

.157 

400-149 

96 

.106 

32 

.059 

350-399 

25 

.028 

9 

.016 

300-349 

4 

.001 

1 

.002 

250-299 

1 

.001 

1 

.002 

200-249 

1 

.001 

0 

.000 


n m “903 

1.00 

n. - 547 

1.00 
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MEASURES 

OF 

CENTRAL TENDENCY 


4.1 

INTRODUCTION 

We saw in Chapter 3 how the properties of a collection of scores can be 
depicted graphically or in tabular form. Frequently a graph or table of 
data tells us more than we want or need to know, and the message it conveys 
may be time-consuming to communicate. Usually we single out for de- 
scription just two or three properties of a set of scores. These properties 
( e.g ., the typical “size” of the scores and their spread) may be describabie 
by indexes known as summary statistics. In this chapter we shall study 
summary statistics which describe the typical “size” of a set of scores. 

The descriptive indexes that will be developed in this chapter would be 
used to answer a question like “How tall is the typical male graduate student 
in this university ?” If the scores in a set are thought of as positioned along 
a number line, the property of the set that we now want to describe is where 
along that line the scores tend to lie. Are they bunched around a score of 
71 , or do they center about a score of 67 ? Different measures of the centra! 
tendency of a set of scores imply different definitions of a “central score,” 
There are only a few summary measures of central tendency in common use, 
and in the following sections we shall study them in detail. 
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THE MODE 


The most easily obtained measure of central tendency is the mode. The mode 
is the score in a set of scores that occurs most frequently. Not every set 
of scores has a single mode by a strict interpretation of this definition, so 
the working definition of the mode contains some qualifications and con- 
ventions that w»U be discussed after an illustration. 

In the set of scores (2, 6, 6, 8, 9, 9, 9, 10) the mode is 9 because it occurs 
more often than any other score. Note that the mode is the most frequent 
score (9 in this example) and not the frequency of that score (3 in this example). 


4.5 

CONVENTIONAL USE OF 
THE MODE 


1. When all of the scores in a group occur with the same frequency, 
it is customary to say that the group of scores has no mode. Thus 
there is no mode in the group (0.5, 0.5, 1.6, 1.6, 3.9, 3.9). 

2. When two adjacent scores have the same frequency and this common 
frequency is greater than that for any other score, the mode is the 
average of the two adjacent scores. Thus, the mode of the group 
of scores (0, 1, 1, 2, 2, 2, 3, 3, 3, 4) is 2.5. 

3. If in a group of scores two nonadjaccnt scores have the same 
frequency and this common frequency is greater than that for any 
other score, two modes exist. In the group of scores (10, 11, 11, 11, 
12 , 13, 14 , 14, 14 , 17 ), both 11 and 14 arc modes; the group of 
scores is said to be bimodat. Large sets of scores arc often referred 
to as bi modal when they present a frequency polygon that looks 
like a Bactfian (two-humped) camel's back even though the frequen- 
cies at the two peaks are not strictly equal. This slight twisting of 
the definition is allowed because the term bimodat is so convenient 
and nicely descriptive. A convenient distinction can be made be- 
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tween major and minor modes. In a group of scores the major 
mode is the single score which satisfies the definition of the mode. 
However, several minor modes might exist at points throughout the 
group of scores. These minor modes are essentially local peaks on 
the frequency distribution. For example, in Fig. 4.1 the major 
mode is at 6 and minor modes exist at 3.5 and 10. 


4.4 

THE MEDIAN 

You have already encountered the median (though it wasn’t called by that 
name) and learned how to find it in Sec. 3.2 on quantiles. 

Definition: The median, Md, is the 50th percentile in a group of 
scores. It is the score that divides the ranked scores 
into halves, such that half of the scores are larger 
than the median, and the other half are smaller. 


4.5 

CALCULATION OF THE 
MEDIAN 


1. If the data are an odd number of untied scores, e.g., 11, 13, 18, 19, 
20, the median is the middle score when they are ranked; Md — 18. 

2. If the data are an even number of untied scores, e.g., 4, 9, 13, 14, 
the median is the point halfway between the two central values when 
the scores are ranked: Md = (9 + 13)/2 =11. 

3. If tied scores occur in the data, particularly at or near the median, 
a frequency tabulation of the scores will probably be necessary. 
Interpolation within a score class will often be necessary in such 
instances. For example, suppose that 36 scores ranging from 7.0 to 
10.5 have the following ungrouped frequency distribution: 

Cumulative 


Score 

Frequency 

frequency 

10.5 

2 

36 

10.0 

3 

34 

9.5 

2 

31 

9.0 

6 

29 

8.5 

10 = 5 + 5 

23 

8.0 

8 I 

13 

7.5 

413 

5 

7.0 

V 

1 


n = 36 
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The mean is no more than Ihe familiar “average.” People often use 
the word “average” for any one of the three above-mentioned “measures 
of central tendency.” The name “mean,” being less ambiguous, is preferred. 

You may begin to question this proliferation of measures of central 
tendency, especially when someone is holding you responsible for learning 
them. However, each measure has characteristics that make it uniquely 
valuable. 

The mode is easiest to calculate— it can be found literally at a glance. 
Moreover, in very large groups of scores it is a fairly stable measure of the 
center of the distribution. In many distributions of large numbers of 
measures taken in educational and psychological research, the mode is close 
to two other measures of central tendency, the median and the mean. 

The median stands between the mode and the mean in computational 
effort, if the calculation is done by hand. This measure is obtained almost 
entirely by counting and can be easily calculated once the data have been 
arranged in numerical order. For a large number of scores, the data can 
first be arranged in a grouped frequency distribution (which is considerably 
easier than ranking the data), and then the median can be easily obtained. 
For purely descriptive purposes, classroom teachers and others working 
with small samples will find that percentile measures of central tendency 
and variability will serve them well enough. 

The mean of a set of data involves the most arithmetic computation. 
The value of the mean is affected by the individual values of all of the scores 
in the set of data. The median and mode may not be affected by all of the 
values. For example, observe what happens to the values of the mean, 
median, and mode when the largest score in the following set is doubled: 

Mean Median Mode 

Setl: 1,3, 3, 5, 6,7,8 V- 5 3 

Set 2: 1,3,3,5,6,7,16 ^53 

The value of the mean is especially affected by what might be called 
“outliers,” i.e., scores which lie far from the center of the group of scores. 
Whether' this is an advantage depends upon the particular questions you 
are asking of the data. 


4.7 

CALCULATION OF THE MEAN 

The definition of the mean is so well known and the calculations by which 
it is found are so simple that you may wonder why its calculation deserves 
special attention. Part of the reason is that, traditionally, statistical calcu- 
lations were performed by hand without the aid of machines. Under thew 
circumstances, various methods of simplifying the operations in finding X. 
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group of scores is that they have the group frequency distribution shown in 
the left portion of Table 4.2. 


TABLE 4.2 ILLUSTRATION OF CALCULATION OF THE MEAN FROM A GROUPED 
FREQUENCY DISTRIBUTION 


Score 

interval 

/. 

Midpoint of 
score interval 

ft ■ midpoint 

70-74 

1 

72 


65-69 

0 

67 

0 

60-64 

3 

62 

186 

55-59 

2 

57 

114 

50-54 

6 

52 

312 

45-49 

10 

47 

470 

40-44 

S 

42 

336 

35-39 

8 

37 

296 

30-34 

4 

32 

128 

25-29 

2 

27 

54 

20-24 

4 

22 

88 

15-19 

1 

17 

17 

10-14 

1 

12 

12 


t-i 

£ /, (midpoint) 

- 2085 


Lacking better information and for. the sake of simplicity, we assume 
that the scores in any score interval are evenly distributed along the interval. 
This assumption is usually false and for this reason the value we shall 
calculate is only an approximation (a rather close one) to the mean of the 
ungrouped data. iThe degree of error introduced into the calculation of 
many statistics by this assumption was the subject of some theoretical work 
by Sheppard (e.g., see Keeping, 1962, p. 107) to which you may wish to 
refer at some point later in your statistical training.] Under the assumption, 
the sum of the/ scores in any score interval equals the product off and the 
midpoint of the interval. These products appear as the last column in Table 
4.2. The grand sum of all of the scores in the group is approximately equal 
to the sum of these products. Thus, the mean of the ungrouped scores is 
approximately equal to this grand sum divided by n, i.e., 

v _ 2 /.(midpoint) 

X , (4.3) 


and the summation extends over all of the score intervals. 

For Table 4.2, the following approximation to the mean of the scores 
is obtained: 


Jy~ 2 -^ 

50 


= 41.70. 
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If each score in a set whose mean is A”, is multipled by a constant c, 
the mean of the resulting scores is cX_, because 2 eXJn = c 2 XJn = cX. 

A fourth property of the mean concerns the n deviation scores. The 
sum of the squared deviations of scores from their arithmetic mean is less than 
the sum of the squared deviations around any point other than X . 

That is, (X, - X? + (X 2 - xy + . . . + (X„ - xy is smaller in 
value than (A\ — b)* 4 - (X 2 — b) 1 + . . . -f- (X n ~ b) z , where b is any number 
other than X., the mean.* 

For example, the sum of the squared deviations of 0 , 1 , ] , 3 , 5 around 2, 
their mean, is (0 - 2)* + (1 - 2)* + (1 - 2) z + (3 - 2) 2 + (5 - 2) 2 = 
(—2)* + (—1)* + (—1)* + (1)* + (3)* = 16. The sum of the squared 
deviations of 0, I, 1, 3, 5 around 1 is equal to 21, which is greater than 16. 

4.9 

MEAN, MEDIAN, AND MODE 

OF COMBINED GROUPS 

We might know the means, medians, and modes of three separate classrooms 
in a school and wish to find the same measures for all three classrooms com- 
bined. This will be a simple matter in the case of the mean, but for the 
median and mode it will be necessary to go back to the original data and 
make new calculations. The ease with which the mean of the combined 
groups is found reveals one of the advantages of definite summary statistics 
in terms of simple algebraic operations, such as adding and dividing, and 
having every score in a group exert an influence on the measure of central 
tendency. The median and mode are found by the operations of ranking 
and inspecting the data, respectively. 

The means and frequencies for classrooms A, B, and C are as follows: 

X A = 11.9 n A = 24 . 

X B = 14.2 n B «= 30 . 

X c = 10.8 n c = 28 . 

* Proof: 

2 ix, - (X + c) i* - 2 we, - x.) - cp 

- 2 ex, - ry - 2.x « - x.) + nc'~ xy - ry + 
because 2 « - X.) = 0.e‘ £ 0. 

2 « - xy £ 2 K - (X. + «)]■ - 0 . 

«-l (-1 
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The total nofall three classes combined is equal ton — n A 4 - 4- «c ~ 

82. The mean of the combined groups is simply the sum of all 82 scores 
divided by 82. The sum of the 24 scores in class A is simply tt A • X A — 
24(11.9) = 285.6, because the mean is the sum of the scores divided by the 
number of scores, i.e., £ A ' = nX. Similarly for groups B and C, the sums 
of the scores are 30(14.2) = 426.0 and 28(10.8) = 302.4, respectively. 
285.6 4- 426.0 4- 302.4 = 1014.0. If we combine all 82 scores and sum them, 
we will also obtain 1014.0. Thus, the mean of all three classes combined 
is (1014.0)/82 = 12.4. Symbolically, the mean of the combined groups is' 

V — 'T' n tt%n 4- n c )?c (4 4 ) 

rt A + n B + n c 

You should now be able to write the formula for the mean of four com- 
bined groups when you are given only the four means and numbers of scores 
per group. 

Notice that if each group is based on the same number of frequencies, 
n A =* *= ”c = w * l ^ cn Eq* (4.4) becomes 

JP a n (^* 4- 4- A’p) X A 4- Xfl 4- X c (4 5 ) 

3 n 3 

This shows that if the three groups are the same size, the mean of the com- 
bined gToup is the same as the unweighted average of the three means. Of 
course, this is true for combining any number of means of equal-size groups. 

Attempting to find the median or the mode of a combination of groups 
is a different matter, however. Suppose you know that group A has six 
scores and the mode is 17 and that group B also has six scores and a mode 
of 19. What is the mode of groups A and B combined? Would you guess 
that it is 18 = (17 + 19)/2? If you did you might be wrong, because 
groups A and B might look like this: 


A 

B 

15 

15 

15 

15 

17 

17 

17 

19 

17 

19 

18 

19 


.. ^ a" f ,i nd S l' C “ mbilred - ll >' “O" of is omirs four limes and is 
miX h, lh “ 0mb ‘ ,,cd 8™ P>- The,, was no way of knowing thal this 
FofbmXr 'JT y T ,0 “ ,hc f» a and B individually. 

nL J,” 1 * lh ' >° U original data in hand 

y ca fin these measures of central tendency on combined groups- 
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Each of the measures of central tendency we have presented has an interesting 
interpretation in terms of errors which result when a single statistic takes 
the place of each score in the group. The sense in which the mode is the 
most representative score or the score which best “takes the place of all of 
the scores” is fairly obvious. Jf we were forced to select one score to stand 
for every score in a group, the selected score would equal the score for which 
it stands the greatest number of times if it (the selected score) is the mode of 
the group. Or similarly, if we were being paid one dollar for each time we 
correctly guessed which score in a group would be selected by chance, we 
should make most money in the long run by always guessing the mode. 

One interpretation of the median of a group is not so obvious. Suppose 
that the scores in a group are placed along the real-number line. (1, 3, 6, 
7, 8) appear on the line below: 

Md 

I 

o 0 0 0 0 

I I < t t 1 1 £ £ — 

0 1 2345678 

The Md indicates the median of the group, 6. The distance between 
6 and I is 5 units; between 6 and 3, 3 units; between 6 and 6, 0 units; 
between 6 and 7, 1 unit; between 6 and 8,2 units. The sum of these distances, 
5 q_ 3 q- 0 -fl + 2= 11, is smaller than would be the sum of the distances 
of the five points from any other point on the line. (Try it and see for 
yourself.) The median of a group of scores is that point on the number line 
such that the sum of the absolute (/.<?., unsigned) distances of all scores in the 
group to that point is smaller than the sum of distances to any other point.* 

If the median is taken in place of every score in the group, the least error 

results provided “error” is defined as the sum of the absolute distances 

of each score to the score which will take its place. 

The interpretation of the mean has already been noted. The mean of a 
group of scores is that point on the number line such that the sum of the squared 
distances of all scores to that point is smaller than the sum of the squared 
distances to any other point. If the mean is taken in place of every score in 
the group, the least error results— provided "error" is defined as the sum of 
the squared distances of each score to the score that will take its place. 

* For a proof of this property of the median, see Horst (1931). 
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Calculation of the mode, median, or mean is purely mechanical. Machines 
do it with far greater accuracy and speed than humans. However, the choice 
among these three measures and their interpretation may sometimes require 
judicious thought. Here are some considerations to keep in mind when you 
arc faced with the choice: 


1. In small groups the mode can be quite unstable. The mode of the 
group ( , 1, 1, 3, 5, 7, 7, 8) is I; but if one of the l’sis changed to 0 
and the other to 2, the mode becomes 7. 

2. The median is m , afTecced by the size of ibe “large" and “small" 
scores above or below it. . For example, in a group of 50 scores 
the medran will not change when the largest score is tripled. 

1 oTw * j>y ,he si “ ftery score in the group, ir 

direebon hv !‘ ch “S ed b T c •?. will be changed in he same 
“ “" M - P °' if 100 is added to the third 

10 units? 8 "™ P ° f thc mcan of ,hl! & on P is Incased by 

4. Some groups of scores simply do not 'Tend eentrallv” in anv 

SaffiSL?' » calculate one 

71,15 ,s particularly true of groups of scores 
the authors’ who isVret T°h C F ° f cx ? m P Ie » an acquaintance of 
tains that he can build achievem^mre^co 1 ' 111111 d . CVe f Io PP ient m . ain ‘ 
choicc items that cm 5 m tests composed of eight multiplc- 

ha\e acquired the concent of of sl . udcnls into those who 

who have not H' ! add,n 5 numbers and those 

••have-nots" w.Itm'a^tesoo'fa^? ** 7 ■ a " d *.«“ 
Uudents produced scores ^reidi^ hi^am HZt 
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Mode 

FIG. 4.3 A symmetrical, unimodai group of scores. 

score. The median of the group is approximately 2.17, even 
though the score that is two ranks above the middle score is 6! 
Neither the mean nor the median depicts this group of scores well. 
Perhaps the best simple summary of the size of the scores is the 
statement that “the histogram is bimodal and U-shaped with one 
mode at 0 and the other at 8.” 

5. The central tendency of groups of scores which contain extreme 
values is probably best measured by the median if the scores are 
unimodai. As pointed out previously, each score in a group 
influences the mean. Thus, one extreme value can move the mean 
of the group far away from what would generally be regarded as 
the central region. For example, if nine persons have incomes 
ranging from $4500 to $5200, with an average of $4900, and the 
tenth person’s income is $20,000, the average income of the group 
of 10 is $6410. This figure does not do justice to either group, 
though it would be an impressive figure for the president of a small 
company (whose salary was $20,000) to report as the average salary 
on his payroll. The median would be preferred as a measure of 
central tendency in this instance. Demographers, economists, and 
journalists often choose to report “median income" because they 
wish to avoid the problem just illustrated. 

(j. fn unimodai' groups of scores fiftaf an? symntefrnrs/ (re., Cite fonVof 
the histogram below the mode is the mirror image of the half above 
the mode), the mean, median, and mode are equal. For an example, 
see Fig. 4.3. The frequency polygon shows that the mean, median, 
and mode are all 40. 

Lack of perfect symmetry in the frequency polygon or histogram 
of a group of scores generally has a regular effect on the relation- 
ships among the mean, median, and mode. Suppose that a pre- 
ponderance of the scores in some group lie above a peak in the 
frequency polygon as in Fig. 4.4. 

In Fig. 4.4 the mode (Mo) equals 100, the median (Md) equals 
104.6 and the mean (X) equals 105.98. If a preponderance of the 
scores lie lower than the peak of a frequency polygon, it would 
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tf_7/ 90 — 55 \oo \oi Ho ili 120 125 130 
Scene 

FIG. 4.4 An asymmetrical frequency polygon illustrating the relationships 
among the mean, med i a n , and mode. 


generally be true that the mean would be the smallest number, the 
median would be next largest, and the mode would be the 
largest number. 

7. An additional consideration which bears on one’s choice of a 
measure of central tendency C3n be discussed only superficially with 
the concepts developed thus far. When the group of scores in hand 
is considered to be a sample from a much larger symmetrical group, 
the mean of the sample is probably closer to the center of the large 
group of scores than is either the median or the mode of the scores 
in hand. We shall return to this important point in Sec. 11.4. 

The following anecdote summarizes many of the problems that arise 
in the use of measures of central tendency: 

An improbable anecdote should make tins problem of heterogeneity 
lie.. that no measure of central tendency is an adequate description of all 
of the scores in a group) plain. Five men once sat near each other on a 
park bench. Two were vagrants, each with total worldly assets of 25 cents. 
The third was a workman whose bank account and other assets totaled 
$2000. The fourth man had $15,006 in various forms. The fifth was a multi- 
millionaire wnh a net worth of 55,000,000. Therefore, the modal assets 
of the group were 25 cents. This figure describes [the financial assets oD 
two of the persons perfectly, but is grossly inaccurate for the other three. 
The median figure of $2000 does little justice to anyone except the workman. 
The mean, $1,003,400.10, is not very satisfactory even for the multimillion- 
aire. If we W to choose one measure or central tendency, perhaps it would 
be the mode, which describes 40 percent of this group accurately. But if 
told that “the modal assets of five persons sitting on a park bench are 25 
cents,** we would be likely to conclude that the total assets of the group are 
approximately 11-5. which is more than five million dollars lower than the 
correct figure. Obviously, no measure of central tendency whatsoever is 
adequate for these “strange bcnchfellows," who simply do not "tend 
centrally." • 1 


- , Uom V*& 73 of C. Stanley , \Uas*>tmtzi in Today's ScW£r,4th Ed., 

C 15*4 Repoetej by penamwa of Prewice-Had. Inc., Englewood CJifli. NJ. 
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Another escape from the contradictions in this anecdote is to point out 
that no summary statistic is needed for a group of five scores. 


4.12 

OTHER MEASURES OF 
CENTRAL TENDENCY 

Numerous other ways of locating a “central value" in a group of scores 
exist. A few of these measures will be presented here desp.te the rarity of 
their appearance in the literature of educational research. 

The Geometric Mean 

Undoubtedly you recall that n/ 4 = 2 and that, more generally, if A = 

Aen = a. Perhaps your memory is somewhat more vague on the general 
ouestion of roots of numbers. iTa is read “the cube roof of the number a 
E? is That number which when raised to the thud power equahn , , , 

»/- i. = The nth root of a, denoted V a, is that 

number* which when raised to the. nth power (he. multiplied by Mf n 
times) equals u. For example, ^16-2 because 2 • 2 ■ 2 • 2 - 2 - 16. 

Definition: The geometric mean of the n positive numbers 
X, is given by 


gm — V Xi • X t 


(4.6) 


j.e. the geometric mean of X., ... , X. is the nth 
root of the grand product of all n of the X s. 

S&^onid equal zero even if zero was (by any other definition) 

faF T ;e“olTr'i= a me^ f is hC uS P for describing .he central location of 
rate-of-change scores (see Ferguson, 1959, p. 50). 


The 


Harmonic Mean 

This measure of central 
.■ •. Uoinc- nverased. 


tendency is sometimes applied when a group of 
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Definition; The harmonic mean of the positive scores X x 

X n is given by 

1 

hm “ ki/xj + |i|« + • - • + CTO] I" 


i nx, 

1-1 

It is thus their reciprocab (where \/X { is the recip- 
rocal of X 4 ). To find hm, each score is divided into 
l, the resulting reciprocals are summed, and this 
sum is divided into n. 

The harmonic mean has very limited application. A contraharmonic 
mean is thought to have broader application than either the geometric or 
harmonic mean (see Senders, 1958, pp. 317-18); however, it is doubtfu 
that any of these three measures is sufficiently familiar to most readers that 
the use of them in written reports would pass without considerable confusion. 

The Ratio of Means and the 
Mean of Ratio* 

An IQ score, as computed from most or the earlier intelligence tests, is 
ICO times the ratio of a person’s mental age to his chronological age: 
IQ => 100(MAJCA). At first sight, it might appear that the average IQ 
of a group of persons could be found by dividing the average mental age 
of the group by the average chronological age and multiplying by 100, i.e., 
that IQ = 10G(MA/CA). However, this is not generally true. 

Only under very special conditions will it be true that 
1 

n i n 

i.e., that the mean of the ratios is the ratio of means (see Stanley, 1957b). 
This seems lilt such a simple fact, but in a slightly more complicated form 
the error of assuming that the mean of ratios is the ratio of means sneaks 
into statistical literature (c.g., sec Winer, 1962, pp. 61 and 119; Fcreuson, 
1959, p. 258). 6 

PROBLEMS AND EXERCISES 

1. Find the mean, median, and mode of the following set of scores: 1.2, 1.5, 1.6, 
-1. 2-4. 2.4, 2.7, 2.8, 3.0, 3.0, 3.0. J.t, 3.1. 3,1, 3.4. 



PROBLEMS AND EXERCISES 73 


2. Suppose the number 0.5 is added to each of the 15 scores in Prob. 1 above. 
What will be the value of the mean and median of these 15 augmented scores? 

3. Find the mean and median of the 100 scores in the following grouped frequency 
distribution: 


Score interval Frequency 


20-22 8 

17-19 14 

14-16 41 

11-13 26 

8-10 7 

5-7 4 


n = 100 


4. Suppose that each of the 100 scores in the grouped frequency distribution in 
Prob. 3 is multiplied by 3. What would be the values of the mean and median 
of the 100 resulting scores? 

5. Find the mean and median in the following grouped frequency distribution: 


Score interval Frequency 


20-24 2 

15-19 11 

10-14 17 

5-9 13 

0-4 9 

(-5H-1) J 


n = 59 


6. Group A contains 10 scores, the mean and median of which are 14.5 and 13, 
respectively. Group B contains 20 scores, the mean and median of which are 
12.7 and 10, respectively. What are the mean and median of the 30 scores 
obtained from combining Groups A and B7 

7 The seven members of the Sunday Afternoon Picnic Society (SAPS) live along a 
* straight stretch of Highway 101. Their homes are positioned along the highway 
as follows: 



5 mi 2ml 5mi 2 mi Imi Imi 



74 MEASURES OF CENTRAL TENDENCY 


CHAP. 4 


The cost of gas— 3.5 cents per mile— for the travel of all members to the Sunday 
outing is taken out of the club treasury. Since any point along Highway 101 
is a fine place for a picnic, where along the road should the members hold their 
picnic so as to spend the minimum amount of money for travel ? 

8. Find the mean SAT-V score from the n — 903 scores for men in Prob. 4 at the 
end of Chapter 3. (Hint: a quick way to find the mean is to multiply the mid- 
point of each score interval by the relative frequency of the scores in that interval, 
and then add up these products for all 12 intervals.) 
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MEASURES 

OF 

VARIABILITY 


5.1 

INTRODUCTION 


„ . . j-nrv tell us about the concentration of a group of 

Measures of centra tende y mcasure of ce „,ral tendency gives 

scores on a number s “ ' fKV e ra | sen sesalloft h escormnthe group, 

a score that represents i , hat exist among separate scores. 

This process disrega d tbe (<) measurc the variation of scores 

Other descrtpttve statist. cs a q sl a tis tics ^ m e asure , different 

portant functions of statist! be jed^ced, explained, or accounted for. 

which is in a sense uncertainty, nc d with notions of variability. 

The whole scientific enterpris predictions can never be very 

When much unexplained variab, y for „./,y people or things 

accurate. But when explana t on of variability can be removed, 

differ, uncertainty can be ream. 
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For example, if nothing is known about why people differ in intelligence, 
one is faced with great uncertainty when attempting to predict intelligence; 
some people would appear “bright” and others “dull” and no one would 
now why. However, if it is known that heredity and environment produce 
quantifiable influences on measured IQ, then knowing a child's ancestry and 
early upbringing would permit a more accurate prediction of his adult 
intelligence In other words, the variability of IQ’s for persons with like 
hered.ty and environment is less than it is for people in general. But before 
SSSSa; ^ 5l0fticrmattcrs >— “earnaboutconventional 


5.2 

THE RANGE 


SfcTufc ,he f “|! disUn “ a,0n s 'he number scale over 

have been given in the' past Twin d,ff '" nt d e»nMons °f 'he range 
range: .he toetetoe aad the’wcJ^P.^ '° d ' S,i,, 8“ isl, tw ° ° f 

Definition: The range i, .he difference between .he 

largest and smallest scores m a group. 

• - ° «• *■ * ». *. - • » 
1.6 - (-0.2) = l.g. ’ *’ and *- 6 ha ve an exclusive range of 

containing .he smallest score. " Val 

The following h'gto are oblatoed- s’y 1 (r S v“ ,rtd 10 the "' a '«' inch. 

hcghl of the shortest boy U somewhere - n • 62 • 65 . and 66*. The actual 
real hnul being 58.5'. The upper rr!d r"'™'” 5S ' 5 ' 55-5'. the lower 

1 “? c * t . KO " is 66.5’. Thus M ' m ' 1 ° f ‘ hC ,nlerva ' conta ini"S 'he 
"htch „ 1 , arger tha „ ^ J -*' range equals 66.5’ _ 58 . 5 ' - 8', 


Inclujitre ronaen ft 



^'Sht n inches 


FIG. 5.1 Illustration of the in- 
clusive and the exclusive range. 
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The exclusive range is the distance between the smallest and largest 
reported scores in a group, and is thus likely to exclude an actual score that 
lies above the largest or below the smallest reported score (see Sec. 2.3 for 
definitions of actual and reported scores). 

The inclusive range is sufficiently large to include all actual scores as 
well as all reported scores. 

In the future, if we refer to “the range” without specifying inclusive 
or exclusive, the point being made will be equally valid for either. Although 
the meaning of the range as a measure of variability is quite clear, it has 
certain drawbacks. Because the range is determined by just two scores in 
the group, it ignores the spread of all scores except the largest and smallest. 
For example, if 100 scores are spread evenly from 1 through 10, the inclusive 
range is 10.5 — 0.5 = 10. But if one score is at 1 and one score is at 10 
and the remaining 98 scores are at 5, the inclusive range is still equal to 10. 
For almost any purpose, these two types of heterogeneity (one in which 100 
scores are uniformly distributed over a ten-unit interval and the other in 
which all but two of the 100 scores are clustered at the same point) have 
different meaning; but they cannot be distinguished by merely inspecting the 
range. The range is by far the crudest measure of variability commonly 
employed. 


5.3 

D, THE 90TH-TO-I0TH 
PERCENTILE RANGE 

A second measure of variability is D, the range between the 10th and 90th 
percentiles in a group of scores. It was defined by an educational statistician, 
Truman Kelley (1921), as 

D — P ice (5-1) 

D is somewhat more stable than the range because it is directly affected 
by a greater number of scores. It is easier to compute than other measures 
of variability we shall meet later in this chapter. Neither of these advantages 
has been great enough to make D a popular measure of variability. It is 
seldom used. 


5.4 

THE SEMI-INTERQUARTILE 
RANGE 


In Sec. 3.2 we considered the three quartiles of a distribution of scores: Q u 
the point on the scale below which 25% of the scores lie, Q t (the median), 
and Q t , the point above which 25% of the scores lie. The distance between 
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the first and third quarliles of a group of scores, i.e., Q 9 —Qu 1S calIed thc 
interquartile range. The remi'-interquartile range is hail this distance. 

Using Q to denote the semi-interquartile range, we have the following: 

Definition: The semi-interquartile range Q is half the distance 
between the third and first quartiJes, i.e., 


Q is an easily obtained and useful measure of variability. For de- 
scriptive purposes it is superior to the range on any criterion except com- 
putational simplicity. If two groups of scores have the same value of Q, 
they are much more likely to possess similar patterns of heterogeneity than 
are two groups with the same range. 

In distributions that are nearly symmetrical around the mean or median, 
Q can be used to reconstruct thc score limits between which approximately 
50% of the scores arc contained. If it is known that 250 scores which are 
approximately symmetrically distributed around a median of 63 have a 
semi-interquartile range of 11, then approximately 50% (125) of the scores 
lie between 


MJ-Q = 63- 11 = 52 
and 

Md + Q = 63 + ll = 74. 


If the distribution of scores is very asymmetrical around the median, then as 
many as 70% of the scores might lie within the range Md ~ Q to Md + Q. 
Thc symmetric and asymmetric cases and the use of Q are illustrated in 
Fig. 5.2. 
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THE VARIANCE 


The range, Ihe semi-interquartile range Q, and D — P m ~ P„ are three 
measures of dispersion, scatter, heterogeneity, or variation met thus far. 
Each increases in value as the set of scores on which it is computed shows 
greater dispersion— less homogeneity. Note that, as with the mode and 
median, calculation of these three measures does not involve every individual 
score in a group of scores. We shall now encounter a fourth measure in 
the calculation of which, like the mean, every score is utilized. 

Deviation scores, scores of the form X t — X_, reflect something about 
the variation in a set of scores. A set of scores with great heterogeneity 
will have some large deviation scores. What would the deviation scores be 
if all the scores in the set were 9 ? The mean would be 9, hence every deviation 
score would be 9 — 9 = 0. In the most homogeneous set of scores that it is 
possible to achieve, all the deviation scores equal zero. Some combination 
of the deviation scores might be a useful measure of variation. 

If we were to sum all of the deviation scores, would that sum reflect 
the variation in the original scores ? No, since this sum is always exactly 
zero: £ (Xf — X.) = 0. To overcome this we could square each deviation 
score and sum the resulting squared scores. Hence, for a given set of scores 
a measure of the form 

1 (x t - jp>* = (*i -*)*+...+ cx n - xy 

will be large when the scores are heterogeneous and small when they are 
homogeneous. We would not have had to square the deviation scores to 
get rid of their signs; we could have simply regarded all of them as positive 
(i.e., taken their absolute value). This method would have led to a different 
measure of variation called the mean deviation , which you will encounter 
in Sec. 5.9. The value of the above expression also depends upon how many 
scores are considered. The larger n, the larger the sum tends to be. This is 
a limitation if one wishes to compare the variability in two sets that differ 
in numbers of scores. This limitation is overcome by dividing the expression 
by n — I . The resulting measure of variability is called the variance (denoted 
by si) and has the formula 

i(X<-X ) 2 

4- 1 - 1 , ; - ■ (5.3) 

n — 1 

Why did we divide by n — 1 instead of simply «? A satisfactory answer 
cannot be given at this point because as yet in this chapter we have not 
developed the concepts necessary to give the explanation much meaning. 

You will learn in Sec. 12.4 why n — I was chosen instead of n. 
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We shall use the following data to illustrate 
ance of a set of six scores: 

the calculations of the vari- 

Score 

Score — Mean 

(Score — Mean) 1 

1 

3 

3 

0 

4 

1 — 2 => — 1 

3—2= 1 

3- 2= 1 

0 - 2 = -2 

4- 2= 2 

1 

1 

1 

4 

4 

1 

12 

0 

12 


, 12 12 
= 1 = T 

2.4 

Suppose we had a second, more heterogeneous set of six scores with 
the same mean, 2. These data, and the calculations, are as follows: 

X 

x— X. 

(x-xy 

0 

0 

0 

0 

6 

, 6 

X *i - ra 

0-2 = -2 

0 - 2 = -2 

0 - 2 = -2 

0 — 2 — —2 
6-2= 4 

6-2= 4 

2 (*,-*)= 0 

F - 12 7 * 48 

•i-r. ~ 

4 

4 

4 

4 

16 

16 

i {X t - X? = 48 

1-1 

In the above examples, j* was easily calculated because each score and 
the mean were whole numbers. Computation of s 1 as above would be 
tedious, however, if the mean were 17.697, for example. For this reason 
we seek, by algebraic manipulation, an expression for j* that is computation- 
ally simpler in such instances. 

£<*■- 
i* = — 

xy £o?-«i,+P) 

ix.Z,x, + %Z 


1 f! — 1 

n — l 
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If we recall lhat " nX., we can write the above expression as follows: 


2 X » - 2nX * + nXj £ X ? - nJ ! 

i-O >.l 

n — 1 n — 1 


Since the square of the mean is the square of the sum of all the scores 
divided by the square of n, we have 



(5.4) 


Wc can also multiply both sides of Eq. (5.4) by a fancy form of the 
number 1, namely nfn, to obtain another formula for that does not contain 
the mean: 


»ix; - (£*,)’ 

n(n -1) 


5.6 

CALCULATION OF THE 
VARIANCE s * 


The calculation of s\ by means of Eq. (5.4) will be illustrated on the six 
scores used in the first example above: I, 3, 3, 0, 4, 1. 


X 

X * 

Final calculations 


1 

3 

3 

I 

9 

■ (|- -V ) 12‘ 

144 

9 

rx ,_U — L = 36~ — = 36- 

i u 6 


0 

4 

0 

16 

a 

i 

V© 

11 


1 

I 

, 12 12 


' IS 

11 

•W3 

1^ = 36 

* ; = rri=T 



When one or more of the possible values a variable can assume occurs 
more than once in a group of scores, a simplification of the calculation of s 2 
is possible, as was the case in the calculation of X. (see Sec. 4.7). fn the above 
illustration the number 3 appeared twice. The amount contributed to the 
sum of the ’scores (£ X> by the two threes was 3 + 3 - 2(3). The amount 
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contributed to tbe sum Vhe «quas«& vyara. X 2 ) ttvo two thcees was 
9 q. 9 = 2(9). If the value X, has a frequency in the group of/,, then this 
value will contribute/,*, to the sum of the scores and/,*? to the sum of the 
squared scores. Consequently, in finding ^,X Z , for example, it is not 
necessary to square the value X, each time it occurs. Instead, Xf is found once 
and multiplied by the number of times *, occurs in the group. 

Using the same data upon which the calculation of *. was illustrated 
in Sec. 4.7, we shall find s* by this shortened method, where 

*' " n - 1 

k is the number of different scores, and 2 fi — 


TABLE S.I ILLUSTRATION OF THE CALCULATION OF THE VARIANCE WHEN SOME 
SCORES OCCUR SEVERAL TIMES 


Original scores 

Frequency 

x C °x] f 

Intermediate calculations 

f* fX* 

Final calculations 

2 6 10 

3 6 10 

3 6 11 1 

5 8 11 

5 8 11 

5 9 15 

5 9 18 

2 4 1 

3 9 2 

5 25 4 

6 36 3 

8 64 2 

9 81 2 

10 100 2 

11 121 3 

IS 225 1 

18 324 1 

Xfi-T, 

2 4 

6 18 

20 100 

18 108 

16 128 

18 162 

20 200 

33 363 

15 225 

18 324 

2/^166 XfiXf- 1632 

S x - - £/» - 1«- 

IV-lM- 1632. 

Tf, - iZiy. 

*■> 1632 - 1312.19 = 319.81. 

, 319.81 

= 15.995. 


5.7 

THE STANDARD DEVIATION j 


A measure of variability closely relaled to the variance is the standard 
deviation. The standard deemim, denoted by ,, is defined as tho positive 
squire toot of the variance. To find one first finds a ■ and then finds the 
square root of s*. 


If the variance is 






16, what is the standard deviation? 


16 = 4. 


(5.5) 
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The standard deviation is often a useful measure of variation because 
in many distributions of scores we know approximately what percent of 
the scores lie within one, two, three, or more standard deviations of the 
mean. For example, we may know, that 70% of the scores lie between 
A*. — s x and X . + s t . 

5.8 

SOME PROPERTIES OF THE 
VARIANCE 

Suppose we added a constant number to every score in a set of scores. 
How would the variance of the scores be affected ? In Sec. 5.6 we found that 
the scores 1, 3, 3, 0, 4, 1 have variance equal to 2.4. Let’s add 2 to each 
score and then calculate j*: 


(i Original score + 2) — Mean ( Deviation score) 2 

3 —4 == — 1 1 

5 -4=1 1 

5 -4=1 I 

2 -4 = -2 4 

6 -4=2 4 

3 —4 = — 1 _1 

Sum *= 24 Sum of deviations =0 12 


Mean — 4 


Adding 2 to each score did not change the value of s*. In general, 
adding a constant c to each score in a group will not change the variance 
(nor the standard deviation) of the scores: 

f (m + e> - [x (K + < V"]) |,K + ‘-CZ KM - MM? 

II — 1 II- 1 

i(x, + c ~ K- C y- 

<-i 

n — 1 

- xy 

— 1=1 = 5*. 

it — l 

What would happen to s* if each score were multiplied by a constant, 
say 21 
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(Original score X 2) - Mean (Dniaiion score)' 


1 

6 

6 

0 

8 

2 

Sum =» 24 
Mean =* 4 



4 

4 

4 

16 

16 

4 

43 


Note also 9.6 equats (2 1 ) - (2.4) In general, multiplying each score by a 
constant c makes the variance of the resulting scores equal to c*j*: 

n-1 “ n — 1 “ "~ l 

iftx ,-x.f c‘i(x,-xy 

= *=i =. -'- 1 <=» t-Y;. 

r» — 1 n — 1 

In Chapter 4, the mean of a set of scores formed by pooling scores from 
two separate groups was found to be a simple weighted average of the means 
of the two groups (see Sec. 4.9). The comparable situation for variances 
is more complicated. It will be seen that the variance of the set of scores 
formed by pooling scores from groups a and b depends on both the variances 
and means of the two groups. Notice that if group a comprises the scores 
3, 3, 3 and 3 and group b is 6, 6, 6, 6, then the variance of groups a and b 
combined (3, 3, 3, 3, 6, 6, 6, 6) is not zero even though s* = sj — 0. 

Suppose that a and b denote two separate sets of scores: 


Croup a Group b 


Group size 

Mean 

Variance 


«• 


*. 

A 




The variance of the group of/i. -f n k scores formed by combining groups 
a and b is 


- Kfr ~~ + <"» ~ 4- «.(■£. - X_)' + n b (X> - Xjft 


(5-6) 


n« t n* — 1 
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where 


Y W a^.o n bX.b 

n„ + »t, 


5,9 

THE MEAN DEVIATION 

An additional measure of variability, the mean (or average ) deviation is 
somewhat easier to calculate than the standard deviation but is less useful. 
The deviation of each score in a group from the mean is denoted by JT, — X_ . 
The collection of all n of these deviations is descriptive of the amount of 
variability in the original scores. As we saw in Sec. 4.8, however, the sum 
of these positive and negative deviations is not in the least descriptive of the 
total amount of variability in the group of scores because it is always precisely 
zero. If the deviations are regarded as distances of the scores from X. 
without regard to sign, the sum of these distances is descriptive of the vari- 
ability in the scores. 

The distance of each X { from X . is found by a process known as taking 
the absolute value of a number. The absolute value of 4.65 is denoted |4.65| 
and equals 4.65. In fact the absolute value of any positive number is that 
positive number itself. Thus |2| = 2, and |I05| = 105. The absolute value 
of any negative number is found by changing the minus sign to a plus sign: 

J -3) = 3; ]-1.69| = 1.69. Finally, JO} = 0. 

I5| - 5, 

101 ^ 0 , 

1-51 - 5. 

Another way to think of the process of taking absolute values— though 
one would be foolish to do it this way in calculations — is that \a\ = \fcfi, 
i.e., the absolute value of a number is the positive square root of the square 
of the number! 

The distance of the score X, from the mean is given by \X, — A".) . The 
average of the n distances of the scores from their mean is called the mean 
deviation, AfD, (Do not confuse AfD, the mean deviation, with Md, the 
median.) 

MD = — . 

n 

There is no simpler expression for the mean deviation that might be used 
for purposes of calculation. An illustration of the calculation of M D appears 
in Table 5.2. The mean deviation has not often been used as a measure of 
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TABLE 5.2 


ILLUSTRATION OF THE CALCULATION 
Scores X — X. 


OF THE MEAN DEVIATION 

\X — XJ Final calculations 


10 - 12 = -2 
12 - 12 = 0 


10 - 12 = 
15-12 = 




2*1 = 60 
t - 1 

X. =12 




vanabiltty, even though it is easily calculated and has a logical simplicity. 

for lh ' 5 “ ““ lhc m “" deviation does nol have a theoretical 
lhal of tht 'ariance. for esample. The mathe- 
lhat lhc P r °““ of “taking absolute values” presents 
KLn d fled ZT" n derivations. The mean deviation 

htm n?av ate inft- . ,5CU - “ lhis poin '’ ho * e ' er > «f the role 

variables in potions! “ * " ’*’<»* In lasting hypotheses about 


S.IO 

STANDARD scores 


W mSgi d ttJtan ,hC 0fa ■— *» « « of scores 

units. ForeLm I' Trt^seronmL 311 ““ d *"' 

standard deviation of 2 60 in f has a mean or 18.75 and a 

100 persons has a score of 20 h ° 0 ^.."* a ' 3 ctmi " one of these 

scores is not immediately apparent h C “ lfcc,ion of 100 

calculations that 20 lies 1.25 units (20 by 3 *“*“ of 

X, - 18.75 
2M ’ 

the mean and stambrf^v^'” “ Jransformcd. What would be 

The mean of the loo values of , “ , 100 transformed scores? 

o original X scores mi™,. ..,.‘5 I8 ! 75 ,s “)"al to the mean of the 


100 ■ ■ , * inc 1UU values . 

original * scores minus 18 75 s ; nr ,. Vi.7 “ .“ lua ‘ IO mean of the 
^re t subtracts that same constant’ from ,^ " S 3 “"srant from each 

18J5 - the mean of w™*' ra “”' B “' since the mean of 
‘1.25 must be aero. If X- 18.75 has a 
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5 . 1 . 
SKEWNESS 


Aside from being a convenient means of communicating the position 
ofa person s score relative to the mean and measured in standard deviation 
units, z scores are a step toward transforming a set of X scores to an 
arbitrary scale with a convenient mean and standard deviation. For some 
purposes, for example, the a scores themselves may not be convenient. The 
negative scores might be bothersome, and a collection or r scores will 
undoubtedly be full of decimal numbers. A transformation or the r scores 
can eliminate these minor problems. 

Since the n z scores have mean zero and standard deviation I , we know 
that cz— formed by multiplying each z score by the constant c — will have a 
standard deviation of |c|, and cz + tfwill have a mean of 

c: + d=c(0) + d~d. 

For example, a set of 250 2- scores has mean 79.65 and standard deviation 
and 'their 2,^7“ "’if 0 ™ ,hcst » thal their mean is 50 

transformed Zl 2SE 2 L* , VT? ^ ‘ 

above the mean nT th* 9 / 10 °* a standard deviation 

person's 2f score is approximate^.® + oT(S 7'|) = 85 °™'’*' ””J 

score of 59 is more immediatel/informatise ,ha„^ ' &£ l sT 

deviarsV.ha, m a7 p ;tlr 

scores can be placed along any 2 c“S“? c ,t“ C ' a ' ? icn . CK - A ° f 
mean and standard deviation mcrclv bv leiT” hey “" bc B ivcn a desired 
devialion and d the desired mean in iL X ‘ 8 C bc the deSircd standard 
scores are often transformed to a scale withT' 0 ^ + f’ fntclli E cnce ' tcst 
15 or 16. T scores formed by 10- + 50 find”",. °° and sUndard deviation 
other popular scales are shown in Vj. 6 S i„ ThCSC 


important pr^STS^I distribution is one of its more 

grams almost never occur wit/real data Th f ™ que,lc y polygons and histo- 
distnbution of a group or scores i, ' d ,'? r ” 10 "hich Ihe frequency 

an extent of asymmetry will bearmam r ^ ^ Is its afceivnejj. The nature 
” observed. b», 

consequently, various summary smt stie ^ pMsiblc or oonvenient. 
asymmetry of a group of **' W of 
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The best measure of the skewness of a set of scores has the formula 

i(x f ~xyin 

skewness = '■=* . ( 5 . 7 ) 

As was seen in Sec. 5.10, it is standard practice to denote the directed distance 
which a score lies from the mean of the group it is in, measured in units of 
the standard deviation of that group, by z z ; i.e., 

X, ~ X 

z,= 

s x 

If we recognize that 

(x,-xf (-(x.-jfn 3 s 

si "Is, \~~ Z " 

then the measure of skewness in (5.7) becomes 

hi - 

skewness — = r 3 . (5.8) 

n 

Thus, one measure of skewness is simply the average of the z scores 
which have been raised to the third power. (In mathematical statistical work 
this measure of skewness is denoted by >/&. The measure is due to Karl 
Pearson and its properties have been widely studied.) 

Suppose the skewness of the two distributions in Fig. 5.3 is being 
measured. The mean of the scores for distribution A in Fig. 5.3 is about 16. 



FIG. 5.3 Two skewed frequency distributions. 


When the z scores for A are formed there will be some very large positive 
ones (because the largest X score is 22, which is 6 units above the mean) 
that will be larger in absolute value than any of the negative ones (the smallest 
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X score, 13, lies only 3 units below the mean). Now the algebraic sign of a 
number remains the same when it is cubed; (— 2) s = —8. Hence, for dis- 
tribution A , the contribution of the cubed negative z scores to ^ *ll n will be 
less than the contribution of the cubes of those large positive z scores. 
Consequently, the value of in Eq. (5.8) will be large and positive. 
We say that distribution A is positively skewed because its measure of skewness 
is positive. A positively skewed frequency distribution has scores that extend 
further above the mean than the small scores extend below it. 

Distribution B in Fig. 5.3 is negatively skewed. The value of £ 2 V n 
for distribution B is negative. Try to convince yourself that this is true. 
Distribution A is more markedly positively skewed than distribution B is 
negatively skewed. 


In a symmetric distribution the value of the measure of skewness in 
tq. 15.8) is zero. This is true because exact symmetry implies that every 
negabve z score can be paired with a positive i score of equal size. Since 
the cubes of negative scores are still negative, the sum of the cubes of each 
pair of positive and negative z scores is zero. 

desk^alculat na,el f’ ^ Va '^ C ut ^ is laborious t0 obtain, even on a 
measudne he ; “V?" Er0 "P ° f A quick method for 

“ Ta S |£,£T B V “ rad,slribuli °" E™ out of the fact that the mean 

meTsures !f clmS , 7“ ,h3 " ,h ' mcdian - In cha P ler on 

S - tendency we saw that in unimodal positively skewed 
the 2r b •" “ lar «' r ,ban ,h ' toedian, which is in turn larger than 

sk '' v n dis r ib " ,io " s ’ ,hc m ' an is 

Of the distribution. This is indeed so m ,‘ ne ° UtlhCSkCWneSS 
say n of 50 or greater A simnV f reasonably large groups of scores, 

these fact, ha, the follow, Cg definition:’”' ,k ' WnBS m3l “ ° f 


skewness = ^ X - ~~ ^d) 


(5.9) 


three times the dilferena behlKn d th" b '"'°" b ' n ’ Msured b y “king 
standard deviation. The values nt p C and raedian > divided by the 
-3 and +3. When a diltribu, ' ■ ^ (i9) Wi " 6'"=“% «"£' beiween 
measure of skewness in Eu. (5.9) ™ * zero. The 

different distributions because division Com P arc the skewnesses of 

pendent of the variability of the distribution** made th ® measurc inde * 


5.12 

kurtosis 


** are central tendency, variability, a 
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FIG. 5.4 “Peaked,” “flat,” and 
“mesokurtic” curves (A, B, and 
C, respectively). 
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symmetry. A fourth property completes the set of features of distributions 
of scores that are generally of interest in analyzing data. One may wish to 
know something about how peaked or flat a frequency polygon or histogram 
is. Kurtosis is a Greek word and refers to the quality of “peakedness” of 
a curve. (Karl Pearson is credited with formalizing the concept of kurtosis 
in statistics and proposing a method of measuring it.) 

In Fig. 5.4 three curves differing in “peakedness” or kurtosis appear. 
The first (A) is quite peaked; such a curve is called leptokurtic. (The prefix 
“lepto” means “slender" or “narrow.”) The second curve (B) is relatively 
flat; such curves are called plalykurtic. (The prefix “platy” means “flat” or 
“broad.”) The “peakedness” or degree of kurtosis of the third curve (C) 
is a standard against which the kurtosis of other curves is measured. 
The third curve in Fig. 5.4 is the normal curve, which will be discussed at 
length in Chapter 6, and is said to be mesokurtic, “meso” meaning 
“intermediate.” 

We shall now learn how the statistician measures the kurtosis of a curve. 
First it is necessary to point out, however, that the concept of kurtosis 
applies only to unimoda 1 distributions and concerns the steepness of the 
curve in the vicinity of the single mode. (If a distribution has two modes, 
it would be acceptable to talk about the kurtosis of the curve in the vicinity 
of each mode.) 

The customary measure of kurtosis has the following definitional 
formula; 

i(X,~ xy/n 

kurtosis = — : . (5.10) 

s* 

If we recognize that (X f — X.)*fs * is simply 



then we see that a measure of kurtosis is given by the formula 

2 >: i, - 

kurtosis = — — = z*. (5.II) 

n 

That is, kurtosis is measured by taking the average of the fourth powers of 
the z scores. The relationships between the size of the kurtosis statistic 
and the “peakedness” of the distribution on which it is calculated are recorded 
in Table 5.3. 
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TABLE S3 RELATIONSHIP OF THE VALUE OF THE KURTOSIS STATISTIC TO 
THE "PEAKEDNESS” OF THE FREQUENCY DISTRIBUTION 

Nature of Description of Value of kur tosh 

distribution “ peakedness ** statistic ( Eq . 5.1 1) 


Normal, e.g., curve C 

Mesokurtic 

3 

in Fig. 5.4 

Peaked, e.g , curve A 

Leptokurtic 

Greater than 3 

in Fig. 5.4 


(can become very large) 

Flat, e g., curve B 

Plaijkurtic 

Less than 3 

in Fig. 5.4 


(must be zero or greater) 


PROBLEMS AND EXERCISES 


1. Calculate the inclusive range, variance, standard deviation, and mean deviation 
of the following set of scores: 

102 112 116 

106 114 119 

m 115 no 

112 115 122 

(Hint: To simplify calculation,, first subtract 100 (tom all score,; thj,„ill not 

change the values of any of the measures of variability.) 

2 ' fic^Sut'oT 1 '' ,0r "" ,Q “°" 5 ta «“ «— « *“P- 


Score interval Frequency Cumulative frequency 


150-159 5 

140-149 7 

130-139 9 

120-129 12 

110-119 17 

100-109 21 

90-99 12 

80-89 g 

70-79 6 

60-69 1 

50-59 2 


100 

95 

88 

79 

67 

50 

29 

17 

9 

3 

2 


The median of the above set of scores is 1 tw s r» . 


proportion of 
to the median. 
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3. Find sj and s x for the following grade-placement scores arranged in an ungrouped 
frequency distribution: r 

Score Frequency 


6.9 2 

6.8 4 

6.7 5 

6.6 9 

6.5 14 

6.4 10 

6.3 6 

6.2 3 

6.1 2 

6.0 I 

(Hint: Subtract 6.0 from each of the ten score values to simplify calculations.) 

4. Indicate whether each of the following distributions of measures is probably 

negatively or positively skewed: 

a. ages of students in U.S. universities; 

b. numbers of children in U.S. families; 

c. population of cities in the United States; 

d. ages at death of females in the United States. 


5. Group A Group B 


13 28 

II 26 

10 25 

9 24 

7 22 


The variances of both groups A and B equal 5. Will the variance of the ten 
scores formed by pooling together groups A and B be less than, greater than, or 
equal to 5? 

6. For the 290 students taking a 50 item social studies achievement test the mean 
is 32.50 and the standard deviation is 4.80. Find the z scores corresponding to 
the following: 

Test scores X z score 


a. 28 

b. 36 

c. 45 

d. 20 


-0.94 
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7. Two vocabulary tests were administered to the 40 students in a beginning French 
class. The means, the standard deviations on the two tests, and the raw scores 
obtained by students A and B were as follows: 

X. s t A' s scores B's scores 


Test I 54.10 14.28 45 60 

Test 2 21.25 3.52 30 21 

a. Which student has the larger total raw score on tests 1 and 2? 

b. Calculate the z scores for both students on each test. 

c. Which student has the larger total of his two z scores? 

d. Which student has a better knowledge of French vocabulary? (This question 
has no easy answer; attempt to raise the issues involved.) 

8. Prove that ™ n — 1. 
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THE NORMAL 
DISTRIBUTION 


6.1 

INTRODUCTION 

This chapter is an interruption, but a very necessary one. It must stand alone 
and fail to be immediately integrated into the statistics we have developed 
to this point because we are attempting here to gain knowledge of techniques 
that rest on essential concepts we shall not present in detail because of their 
complexity. This is the price that all of vs pay for attempting to master 
the upper layer of statistics before establishing a foundation in bedrock. 
Obviously, we feel that the present venture is worth the price. 


6.2 

HISTORY OF THE NORMAL 
DISTRIBUTION 

The brief history of the discovery and study of the normal distribution given 
here does not do the topic justice. Even the mathematically naive student 
will find Helen Walker's account of the history of the normal distribution 
informative and rewarding reading (Walker, 1929, Chap. II). 
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In Europe in the seventeenth century, a handful of mathematicians were 
pursuing small, private researches which would one day be incorporated 
into the theory of probability. (See Chapter 10.) These studies, by such 
men as Blaise Pascal (1623-1662) and Pierre de Fermat (1601-1665), were 
undertaken at the request of Chevalier de Mere, a gambler to whom the 
nature of chance must have been particularly urgent. 

One of the single greatest events in the early history of probability was 
the publication in 1713 of Ars Conjectandi by the Swiss mathematician 
Jacob Bernoulli (1654-1705). A central issue in probability theory during 
its infancy was determining the probability that an event would occur some 
number of times if it were given several independent opportunities to occur. 
For example, if a fair coin is flipped 20 times, what is the probability that 
15 “heads” will occur? Or, if a die is cast 10 times, what is the probability 
that the face of the die on which six dots appear will turn up exactly twice? 
The solutions to these problems were known at the time Ars Conjectandi was 
published (we will develop the solutions in Chapter 10), and the formula 
properties of these solutions were the main concern of this great work. 
nrnvTm« com P utalions invo,ved in obtaining the solutions for large 
S r«r,h No r ' M ° nabla would attempt to calculate 

! “ ,! h problb " t >' «“* «U»0 losses of a coin will result in 8000 or more 

arc necesst r . e 1h mP 5 ' ui U|1 ’* W °“ ,d bc e,cm 10 bim what calculations 
are necessary, they would be too laborious to carry out. 

findl™ aC1 '- ,ily lhe cad >' o'Shteenth century was directed toward 
that pfoblems" ap P r ° ,in ’ a,i °" s •» ™ny orthe calculations 
fornu a hT ave S y "! V °' V ' d - Stirling published a 

imceers t '’7’'""?!^ ’° "" P 3 " 11 pr ° d “« of 'he " 

ability theory (Wheth r *e5”f a ,crm ,flat appears often in prob- 

‘T" V deriV ' d ,h ' approximation himself 

maiinn^LtelaTh se^MbeLtf";! Wlh S,irii "*' s “PP™- 

How does one approsi^ate the n,ob?h r, .a '. E ' a " d “' P rob,tm ” raI1 ' 

cent with probability p or prod acme one/.-,? " ,n f' pe " dent tria| s of an 
will yield r "successes'”’ Thfm»n S i ^ suc cess ) of its two outcomes 
De M oh re (1667-1754). Ct * Ua to ,he at the time was Abraham 

problemTesoMh'uoToWe 7,!"' ‘ ”7' “ lake a doser look at the 

Assume that thfco ° “ ^, k T ,7' 3 ““ is “> »' %P a d 10 time,, 

ean ast, w ta „ thc probabiL S ^7 “ P “ d5 " 3S “ ,ai,s ” Wc 
probability that 1 “head" will mnli h d , Wl l . resu,t * or w hat is the 
10 "heads" will result f ro .. ‘ what is lh « probability that 

or these questions can be calculated' 1 ” ^ ” act answcrs to all eleven 
calculations are already bccominr ard CVCn , , hou ^ h for j‘ ust 10 tosses the 

a coi " ** calculations^borde^'onfwne'irn" ^ as 1000 * 

° n •'"Possible.) The probabilities that 
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0 , 1 , 2 9, or 10 “heads” will result in 10 flips of a fair coin are graphed 

in Fig. 6.1. 

The problem De Moivre sought to solve was how to find a mathematical 
curve that would approximate closely the curve obtained by connecting the 
tops of the columns in Fig. 6.1. If such a curve could be found, the nearly 
impossible calculations of the probability problem could be replaced by 
simply reading off points on mathematical curves or looking up numbers 
in a mathematical table. 



FIG. 6.1 
numbers 


Graph of the distribution of probabilities 
of "heads'’ in 10 flips of a fair coin. 


of obtaining certain 


„ .hi. to show that a mathematical curve which came very 

close m Ae°= V urvMha.c:nncc,s the tops ofihe line, it. Fig. ddfand the curve 
for almost any other sueh problem) has the following formula. 


where „ is the height of the curve directly above any given value of * in the 
rr is the rati^o/the cfrcumferenc^ofany circle to its diameter, which 
„ is ,hetae r ffThe1yim 2 of natural logarithms, approximately 

n and /ate numbers that locate the curve along the number line and 
control its spread. 

,(■ n t j. e formula for the normal distribution. It is certainly 
"or let that frighten yon. We shall have .„•>« » 
do With the formula as such. 



6.3 

THE NORMAL CURVE 


The graph of Eq. (6.1) yields the familiar, symmetric, bell-shaped curve 
known as the norma! curve. We speak of a normal curve, because Eq. (6.1) 
imparts a characteristic shape to the graph. However, by changing the values 
of /i and a we can move the normal curve up and down the scale and change 
its spread. The value ft corresponds to the mean or a large frequency 
distribution that would look like the normal curve, and a corresponds to the 
standard deviation of the distribution. In Fig. 6.2, the graph of the normal 
distribution for ft = 0 and a = ! appears. We shall use the letter z as the 
symbol for a normally distributed variable with /< = 0 and a = 1. 

^ ™ rvc fig- 6.2 does not meet the r-axis at the points 3 and -3; 
m fact, although it gets closer and closer to the r-axis as X gets larger than 
3, it never touches the ants. The highest point on the curve is above the 
a value of 0; at this point u is approximately .3989. Notice that the curve 
curve ar °“ u d ““ 't"" 3 ' draw " "’'““E 1 ’ *-/*-<>. The normal 
“ * IS i WayS t Sy T'. ,,ic The oretr of the space under the 

1 “lira teaus •'■ am ” ‘ • The the normal curve 

The IturmluTr I' C “ r ' C '! ** y s > rame,r!c a '°und >1" value 

2 : ° *, hc " ormal curvc> ’• e *. "-a average of ,he fourth posters of 

kurtlsis I ^tlrLe a n ° Kd carli "- di! " ib “ liM! w i,h a measure of 
are said to be leoiak r™ rn° re P cal(c 'l than the normal distribution and 

s=sss2a?“”-= ! 

inflection, alexactlv adistTnce r ie - ,hc "as a point of 

The normal^Itrveln Fi^6°2 is"! soerial on h f, id Fi S' “• 

as a standard. It js called the •, ^ !* ° ne * Jecausc ll has been chosen 

curve is 1. mein andlun/' h T™ nTO brea u« the area under the 
and any o,her™ c I“ ££! ™ &■ 

stretched or compressed by a simnle tr r a ° n . S the number scaIe and 

divide b, o, s„ thlt rcoi^des ^ t ^ rnr i “ toa " aad 

It will t,„ _ in “ unit normal curve 

above the z-axis) for a"y rajeof I™ II' or ?'"a t ' “ Cthe height of the curve 
the curve between any two values of * Uai, norrnal curve or the area under 
given is Tar too inconvenient- and ° ^ 0l " mg Eq ‘ (6J > for “ "hen * « 
the curve from z = ~od to ’_ = ." ,hou S h w c know that the area under 

values of z is difficult to find / *. ? area ^ etwccr > any other two 

Table B in the Append- ives , he I ' ” re 3 point. 

Pi* gives the area under the unit normli curve to the 
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left of any point on the a-axis front -3.00 to +3.00. Alto given in Table 
B is the ordinate tr of the unit normal eurve for values of from -3.00 to 

+3 ' A few examples will illustrate the use of Table B Suppose one desires 
to find the area under the unit normal curve to the left of r - -2.50 The 
value -2 50 is found in the first column of Table B. To the right of this 
entry in the second column, titled “Area," the number .0052 is found. Thus 
only 62 ten-thousandths of the area under the unit norma curve is contained 
the left of a = -2.50. The height of the unit normal curve at the point 
a i -2 50 is found in the "Ordinate" column to the right of the “Area" 
column^For^a = -2.50, u - ^ areas and ordinates correctly 

in Table B in the Appendix. 

Va l ue 0 f z Area to the left of z Ordinate at z 


0.00 

0.50 

-1.27 

1.96 


.5000 

.6915 

.1020 

.9750 


.3989 

.3521 

.1781 

.0584 


Since the total area under the curve is 1, the above areas (but not the 
. inatesl can be read as proportions or percents of the total. 97.5% of the 
area under the unit normal curve lies to the left of 1.96. 

T ble B is also used to find the area under the unit normal curve between 
0 r z . For example, the area to the left of z = —1.27 is .1020 
an j area to the left of z = 0.50 is .6915. Therefore, the area between 
— 1 27 and 0.50 is .6915 — .1020 = .5895. In other words, about 59% of 
the area lies between these two points. This is illustrated in Fig. 6.3. 
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FIG. 6.3 Determining the area 
under the unit normal curve between 
two values of z. 


6.4 

THE FAMILY OF NORMAL 
CURVES 


Actually there are infinitely many normal curves, a different one for eac ^ 
different pair of values for y and a. The curve in Fig. 6.4 has a mean y 
of 20 and a standard deviation a of 5, but it is a normal curve nonetheless. 

What do all of these normal curves have in common? For our purposes, 
their most important common property is the amount of area under the 
curve between any two points expressed in standard deviations. For ex- 
ample, in any normal distribution approximately 

1. 68% of the area under the curve lies within one a of the mean 
either way (i.e., y ± la), 

2. 95% of the area under the curve lies within two o' s of the mean, and 

3. 99.7% of the area under the curve lies within three a’s of the mean y. 

You can check these relationships and obtain the exact areas by finding 
the areas under the unit norma/ curve between —1 and +1, —2 and +2» 
and —3 and +3 in Table B of the Appendix. 

The normal curve and its relationship to various transformed scales 
which are widely used in educational and psychological measurement appear 
in Fig. 6.5. 



FIG. 6 4 The normal curve for 
/* =» 20 and a 5. 



6.5 

THE UNIT NORMAL 
DISTRIBUTION AS A 
STANDARD 

The value of X for the unit normal distribution locates a point X units from 
the mean. It will be most useful in the future if all references to scores in 
normal distributions are in terms of deviations from the mean fi, in standard 
deviation a units. For almost any application of the normal curve, we snail 
want to know how many standard deviations a score lies above or below the 
mean. Knowing this, questions about the area between points on any normal 
curve or heights of the curve above any point can be answered by reference 
to the unit normal curve. The deviation of a score from its mean is A' fit 
the number of standard deviations X lies from its mean is (X — /‘)/ £r> 

( X — is called the unit normal deviate. If X has a normal distribution 

with mean fi and standard deviation a, [X — ft)) a has the unit normal 
distribution, but not otherwise. 

The shape of the normal curve docs not change when we subtract fi 
and divide by a. If we would like to know what proportion of the area lies 
to the left of a score of 20 in a normal distribution with mean 25 and standard 
deviation 5, we can translate this question into "What proportion of the area 
lies to the left of (20 — 25)/5 = — 1 in the unit normal distribution?" 

These points are summarized in the following statement: 

If X has a normal distribution with mean ft and standard 
deviation <r, then z = (X — ft)ja has a normal distribution with 
meanQ and standard deviation V ,i.e.,z — {X — fi)\ahastheimil 
normal distribution. 

The area between X t and X 2 in the normal distribution with 
mean ft and standard deviation a is the same as the area between 
z i ~ (^1 — f£)}d nn d Zj == (X 2 — jt)/o in the unit normal 
distribution. 

6.6 

USES OF THE NORMAL CURVE 

De Moivre invented the normal curve for a particular use; namely, to 
provide an easy, approximate solution to applications of probability theory. 
Surely he was never aware that his discovery would find applications in 
practically every corner of science that now exists. Indeed, the wide appli- 
cation and occurrence of the normal distribution are a wonder. 

. T* 1 ' ! ' oma ' distribution plays an important role in both descriptive 
and inferential statistics. We must defer any discussion of the normal 
distribution in inferential statistics until the later chapters in this text. 
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Height in inches 

FIG. 6.6 Frequency polygon for heigh., of 8585 adult men born in Grea. 

Britain during the nineteenth century. 

i Sc in excellent approximation to the frequency 

The n° rm ^ ^ r of observations taken on a variety of variables, 

distributions of large num ' |t malM and adlllt females both 

The frequency polygons of he, igh^ . polygon in fig. 6.6 shows the 

dtAbuiA ofh^ of 8585 adult men born in Grea, Britain during the 
"‘“mJCency polygon in Fig. 6.6 is based on the freqnency distribntion 

shown in Table 6.1 (see Rugg. 1917): 


TABLE 6.1 


FREQUENCY DISTRIBUTION 
Height Frequency 


FOR FIG. 6 6 
Height Frequency 


58' 

59' 

60' 


4 

14 

41 


61' 

62' 

63' 


83 

169 

394 


64' 

65' 

66' 

67' 

68' 

69' 


669 

990 

1223 

1329 

1230 

1067 


Height Frequency 


70' 646 

71' 392 

72' 202 

73' 79 

74' 32 

75' 16 

76' 5 


• . nf neneral and special mental abilities often yield 

Psychometric ^est dos P ly to , ht norma | distribution It 

distributions of s “ r = , IQ . S frora , he Stanford-Binet Intelligence Test are 
is fairly weH known tha IQ ^ a mean (rf of 100 and standard 

£E£w A TAor people in general (see Fig. 6.7). Tests of educationa. 
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FIG. 6.7 Distribution of Stanford-Binet IQ scores. From L. M. Ter man 
andM. A. Merrill, Measuring Intelligence {Boston: Houghion Mifflin, 1937). 


attainment which are constructed in accord with the same psychometric 
principles upon which ability tests rest usually produce frequency polygons 
that resemble the normal curve. Measures of human ability in general arc 
frequently nearly normally distributed. Whether or not some ties can be 
drawn between this fact and the relationship of the normal curve to the 
probability distribution of groups of binomial trials for chance events is a 
task we leave gladly to the more philosophically inclined. 

Somehow the misapprehension arises in the minds of many students that 
there is a necessary link between the normal distribution — an idealized de- 
scription of some frequency distributions — and practically any data they 
might collect. The normal curve is a mathematician’s invention that is a 
reasonably good description of the frequency polygon of measurements on 
several different variables. A collection of scores that are exactly normally 
distributed has never been gathered and never will be. But much is gained 
if we can tolerate the slight error in the statement and claim from time to 
time that scores on a variable are “normally distributed.” (On this point, 
see Boring, 1920, and Kelley, 1923.) 

The mathematical statistician has greatly contributed to the eminence 
of the normal distribution. Although many different mathematical curves 
would fit empirical frequency polygons tolerably well, special mathematical 
advantages are gained when the normal curve can be assumed to “fit the 
data. Certain mathematical properties of the normal curve, Eq. (6.1), 
produce simple and elegant proofs to many inferential statistical problems. 
Without the normal curve, mathematical statisticians would have to labor 
under extreme complications that arise when other mathematical curves are 
used to represent data. 



6.7 

THE BIVARIATE NORMAL 
DISTRIBUTION 


The theory of correlation, which will be the subject of Chapter 7, has close 
historical ties with the normal distribution and the bivariate >a«l^ 
. One nreoccupation of statistics since its inception as a forma 

fnhutio . P description of the way variables are related. Do tall 

discipline has J=" P , win a p i ot 0 f land produce a higher yield 

^corn If we increase the amount of nutrients in the soil? Are bright children 
of corn if we increase children? Each of these questions 

leSS describing .he way in which 

can be stud«d abstract y » P ^ ^ on a second varjable Y for the 

ra°mTp°ersons Thus, thes'e questions concern Weor/«e reMonrHpr, t.e„ 
relationships between two variables. e .g„ we might 

If we measU ' C “, s Yq (?) a nd physical strength (T), the data can be 
measure each person s IQ ! W %J rUtu , h „. Por ea ch person there is a 
represented in ztmnauf q y ^ ^ on y A bivaria te frequency 
pair of scores, his scon = f f cncy with which different pairs of X and 
distribution is a ptetureof fl J 7 R M is a bivariate frequency 

distributionfor agroup ofpersons measured on !Q (X) and physical strength 

‘ ^ From aa g C 30 o^the physicaUtrengt'h variable°y. S Th^height 

”?th h e teaMhe wersection of 125 and 30 must be measured against the 
vertical scale of frequencies. 
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A large number of bivariate frequency distributions built from data 
gathered in educational and psychological settings show a characteristic 
shape. A surface drawn through the end points of the columns which 
represent the frequencies in a bivariate frequency distribution often looks 
like a bell— in three dimensions— that has been stretched in the X and Y 
directions and rotated around its center in the X-Y plane. Much was to 
be gained if the mathematical statistican could find a set of mathematical 
curves that gave a good description of many bivariate frequency distributions. 
The mathematical surface which was found to accomplish this objective is 
called the bivariate normal distribution. This smooth, continuous, bell- 
shaped surface provides a mathematically convenient and satisfactory 
representation of numerous bivariate frequency distributions. 



I. a "" biva ™' distribution 

%&£ A " bi> " ia " : °™' 

I ,hc ’’ 

3. ,hc x 

4 wi,h a 

I or each single score on )’ of V rs. v , 

*traight line. f r each sc P«ale score on X fall on a 
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We shall refer to the bivariate normal distribution on several occasions, 
but we shall not explore further properties of it. For a more advanced 
discussion see Walker and Lev (1953, pp. 248-49). 

PROBLEMS AND EXERCISES 

1. Let z stand for the unit normal variable, i.e., the normally distributed variables 
with mean 0 and standard deviation 1. Find the area under the unit normal 
curve which lies: 

a. above z =* 1.00; 

b. below z *= 2.00; 

c. above z =■ 1.64; 

d. below z «= —1.96; 

e. between z * 0 and z ■= 3.00; 

f. above z — —0.50; 

g. between z — — 1.50 and z » 1.50. 

2. Find the ordinates of the unit normal distribution above each of the following 
z scores : 

a. z — 1.00; 

b. z - -1.00; 

c. z -2.25; 

d. z - -0.15. 

3. Find the z scores which are exceeded by the following proportions of the area 
under the unit normal distribution: 

Proportion of area 

above z score z score 


0.50 0 

0.16 +1.00 

0.84 
0.05 
0.005 
0.995 
0.10 

4. If in the general population of children Stanford-Binet IQ's have a nearly normal 
distribution with mean 100 and standard deviation 16, find the percentile 
equivalent of each of the following IQ's: 

l.Q. Percentile equivalent 


a. 100 

b. 120 

c. 75 

d. 95 

e. 140 


a. 

b 

c 

d 

e 

f 

S- 
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5. X and Y have a bivariate normal distribution. Both X and Y have means of 
zero and variances of 1. Suppose that for X — 1 .25 the variance of the associated 
y variable is 0.50. What is the variance of the Y variable associated with a value 
of 1.50 for the X variable? 



7 


MEASURES 

OF 

RELATIONSHIP 


7.1 

INTRODUCTION 


T ■ . u chall beein the study of the description of relationships 

In this chapter we this general topic at the end of Chapter 9, 

b Tre7halT^or poT.'o" of .ho description of relationships between 
we hope tha J P" use f u i w ill have been covered. A complete 

variables w - 1 * y °“ asurement of relationship or correlation, a subject that 
discussion of century of research, would easily require a boot 

has undergon chapter The topics we shall purposely overlook 

“rr. - le found . n References cL in Sec. 7.9 a. the end of this 
chapter. 


7.2 

THE PEARSON 


PRODUCT-MOMENT 
CORRELATION COEFFICIENT 


. nften concerned with the way in which two variables relate 

Teach oTher for a given group of persons (classrooms, schools, nations. 
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etc.). For example, do students who tend to read earlier than others also 
tend to have higher achievement in science in the sixth grade? Do large 
classrooms show lesser gains in knowledge over a semester than smaller 
classrooms? Is the average length of employment of teachers in a school 
directly related to the average salary for teachers? Obviously, to answer 
such questions we must make observations on each variable for a group of 
units (typically persons, but they might be classrooms, schools, counties, 
etc.). The data gathered to answer one such question might take the following 
form: 


Raw score on 

Stanford-Dinct chemistry 

Student no. IQ score (AT) achieiement test (y) 


1 

no 

31 

2 

112 

25 

3 

110 

19 

4 

120 

24 

5 

103 

17 

6 

126 

28 

7 

113 

18 

8 

m 

20 

9 

106 

16 

10 

10S 

15 

11 

128 

27 

12 

109 

19 


In this example, the variables observed on 12 students were IQ as 
determined by the Stanford-Binet Intelligence Scale in grade 6 and achieve- 
ment in high-school chemistry as measured by a 35-item teacher-made test. 




30 
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™ 155 rfo il^ lie i25 7 



sec. 7.2 


THE PEARSON PRODUCT-MOMENT CORRELATION COEFFICIENT 1 I 1 


The relationship between the two variables can be depicted graphically in a 
presentation of the data called a scatter diagram. The scatter diagram for 
the above data appears as Fig. 7.1. 

Each unit (person in this example) is represented by a point on the scatter 
diagram. A dot or mark is placed for each person at the point of intersection 
of straight lines drawn through his IQ score perpendicular to the X axis and 
through his chemistry-test score perpendicular to the Y axis. The scatter 
diagram in Fig. 7.1 shows a moderate positive relationship between X and Y. 
As yet, however, we have no summary measure of this relationship. 

The general question of “relationship” must be given a somewhat more 
precise meaning. Is high relative standing on A" of a person with respect to 
a group paired with high or low relative standing of that person on Y, or 
is there no systematic pairing off of high and low relative standings? 

The standing of a person relative to others in a group on X and Y 
relative to the means of two distributions is reflected in the size and algebraic 
sign of the deviation scores — X) and (Y t — F), respectively. If a 
person is high on both variables, as is student 11 above, the product of 
(X{ — X) and (F, — F) will be large and positive for him. Similarly, if a 
person is relatively low on both X and Y, (X< — X}( Y ( — F) will be large 
and positive for him also (because the product of two negative numbers is a 
positive number). Now if X and Y are directly related (high paired with high 
and low with Jow) substantially, most of the products (X, — X.)(Y, — F) 
will be positive; consequently, the sum of these products for all persons 

Jj.e., £ iXt — X){Y t — F)j should be large and positive. 

IfAf and Y bear an inverse relationship to each other (high X paired with 
low Y and vice versa), many persons with positive (X t — Xj scores will 
tend to have negative ( Y, — F) scores, and negative (A', — X.) scores will 
tend to be paired with positive (F, — F) scores. In this case, the products 
(Xi — X.)(Yi — F) will generally be negative. Hence, 

£ (X, - X.)(Y t ~ F) 

i-i 

will be negative when X and Y are inversely related. 

If A" and Y bear no systematic relationship to each other (high AT’s are 
as likely to be paired with low F’s as with high F’s, and the same is true of 
low Af’s), then of the people with large positive (X t — X) scores some will 
have positive (F, — F.) scores and others will have negative ( Y t — F) 
scores. When the products (A", — X.)(Y { — F) are formed, some will be 
positive and others will be negative. The sum of the products, 

X(r<-xxr t - rx 

should contain about an even balance of positive and negative terms of the 
same size and should therefore be relatively close to zero. 
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Thus we have the quantity J (X, — X){Y t — T.) being large and positive 

when X and Y are strongly related directly, being near zero when X and Y 
are unrelated, and being large and negative when X and Y are strongly 
related inversely. However, this sum of the products of deviation scores is 
still not an adequate summary measure of relationship. For one thing, its 
size depends on how many pairs of scores are included in its calculation. 
Since we may wish to compare the degree of relationship between X and Y 
in two groups of different size, we shall want to make the measure of re- 
lationship independent of the size of the group on which it is calculated. 
A simple averaging procedure will accomplish this. Two means calculated 
on different size groups are comparable in terms of locating centra! scores, 
but the simple sums of the two groups of scores are not. This is why we 
take an average if we want a statistic to be independent of group size. 
However, for the same reason that j* was defined by dividing the sum of 
squared deviations by n - 1 instead of n, we should divide 

— %XYt — F) by n ~ J. 


( 7 . 1 ) 


variance of X: 


The quantity is „ mtasllre of tta 

r ! "’ d iS ca "' d ,hs “«"«»« of X and Y. 
The covariance of X and Y is denoted by s xy : 

n- F) 

n — 1 

Notice that the covariance or X with itself is simply the 

Y X x,~ X) it*, - xy 

problems in the physlca? sciences airi^en of aviation in many 

the hallowed “correlation coefficient" of theT h* ( ? ^ P h >' sicis,s 
shall encounter presently— the “Him • * h , behav,oral sciences— which we 
adequate measure as Tong afthe s^leZ" And it is an 

are not arbitrary and contain somr» m • and vanance ) of the variables 
we deal are measured on an arbilrarv “T 2 ’. Many variables w!lh which 
whatever anyone wish” , ^ the ■**■> and variance may be 

positions in a group “'naiiy interested only in relative 

tional test data. ^ ,CU ar 7 ,rue °f psychological and educa- 

means h£ maTei^ffiSj” 1 ’ 1 'f ' 5 and rsc ° res around ‘heir respective 
desired measure of relatS - 0 ! ’ h ' T“" s of “ores. To make the 
ationship mdependent of the standard deviations of 
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the two groups of scores, one need only divide by s z and s v . The result 
is the desired measure of relationship between X and Y. It is called the 
Pearson product-moment correlation coefficient and is denoted by r w : 



The designation r conies from the word regression. In the early 
applications of the coefficient by Francis Gallon and Karl Pearson (1857- 
1936), it played an important role in the study of association of physical 
characteristics in humans, a study that first pointed up the regressive nature 
of physical measurements from one generation to the next. Although 
Pearson played the most important role in establishing the mathematical 
properties of r w , the notion of a correlation coefficient equal to s x J(s x ? v ) can 
be traced through the writings of Galton back to an article published in 
1846 by the Frenchman Bravais. 


7.3 

A COMPUTATIONAL 
FORMULA FOR r„ 


Equation (7.2) is definitional and not convenient for computing r«„. We 
shall now derive a form more convenient for calculating on a desk 
calculator, given the raw scores X and Y. Begin with 


F)/(« — i) 


5 ‘ s ' ^£<*. -*.>'/<" -»Vl: 


(7.3) 


*)•/(« - 1 \/2(n - W(" - D 


Note that l/(n 1) can be factored out of the two terms (1/y/n — 1 
from each term) in the denominator of Eq. (7.3), and it cancels the 1 /(« — 1) 
in the numerator of Eq. (7.3). Also recall that since -JaVb — \Jab, the 
terms in the denominator of Eq. (7.3) can be combined under the radical. 


i (x t - xxy t - y) 

r„ - ■ . ; * ■= - • (7.4) 

7 [|« - - 7 ->’] 

Consider just the numerator of (7.4): 

£ (X t - XXY - F) = - X t Y< - Y 1X ( + nX. T . (7.5) 

Several steps were involved in moving from Eq. (7.4) to Eq. (7.5): 
expanding a binomial, moving constants (e.g., JP) outside of summation 
signs, and summing constants. Try to fill in the details. 
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Recalling that J Y = n Y. and £ X — n %. , we can write the right side 
of Eq. (7.5) as follows: 

i x > Y < - n K K-" ?x + n %. y . = i x,7 t -nxx- ( ? - 6 ) 

If we had chosen to replace X. by £ X/n and similarly for Y, we would 
have obtained 


I (X, - - F) = 2 X,Y, - ff j f j g y i) . 


(7.7) 


Either Eq (7.6) or Eq. (7.7) provides a simple formula for the numerator 
of r„. We already know a simple way to compute the denominator of r„. 


z «,-*)*_ 2 X? 

20) - ?.)’ - 2 i? 


.(W 

n 

gg 


(7.8) 


(7.9) 


forr. 


^Combining Eqs. (7.7), (7.8), and (7.9) produces the following formula 


_I.r.r,-(Ix,X?r.Wi. 


" -Oi 70 - (2 X,)’M(2 17- (2 K)'M ’ 

which can be simplified further to become the computational formula 

J!2x.r.-(lx.v5ito 


(7.10) 


(7.1!) 


a desk calculator ou'which "nmc ,l ’ an | E( l ( 7 . 10 ) for finding r„ on 
then be possible to calculate th^n mult, P l,cat, °n” is possible. It will 
and without writing any numbers dm™' 0 ' ° f E, ‘ (7J 1J wilhout divisions 
terms in brackets in the denominator of E^ai'i) 71115 '* a ' S ° b °‘ h 

7.4 

ILLUSTRATION OF THE 

CALCULATION OF r„ 

rrom Eq,. aTo/a^da"!? h " c “ '"“strate the calculation ofr„ 
between two type, of » studying the relationship 

eeasoning and verbal reasoning. * T » ^- xhao1 I™*"*: abstract 
stract reasoning (X) and the ot h- r * * fC const ructed, one measuring 
tW0 adrninistered'to SSeS'T masoning O0 . Thf 

40 hl S h ***ooI juniors from the junior class 



sec. 7.4 


TABLE 7 


ILLUSTRATION OF THE CALCULATION OF r iv IIS 

of the only high school in an Illinois town of about 30,000 residents. The 
test scores for the 40 students appear in Table 7.1. Each test was 50 items 


.1 RAW SCORES ON SO-tTEM TESTS OF ABSTRACT AND VERBAL 
REASONING ABILITY FOR 40 ILLINOIS HIGH-SCHOOL JUNIORS* 



X 

Y 


X 

Y 


Abstract 

Verbal 


Abstract 

Verbal 

Student 

reasoning 

reasoning 

Student 

reasoning 

reasoning 

Linda J. 

19 

17 

Martin T. 

38 

30 

Peggy Y. 

32 

7 

Sharon L. 

25 

18 

Deane L. 

33 

17 

Julie E. 

35 

26 

Constance L. 

44 

28 

Natalie J. 

22 

17 

William P. 

28 

27 

Maryjean K. 

40 

17 

Roger D. 

35 

31 

Larry N. 

42 

26 

Caroline E. 

39 

20 

Michael B. 

41 

16 

Trudy R. 

39 

17 

Carleen M. 

41 

37 

Peter A. 

44 

35 

Scott C. 

37 

26 

David E. 

44 

43 

Sigrid K. 

30 

21 

Cheryl G. 

24 

10 

Jan W. 

31 

16 

Georgia S. 

37 

28 

Roger B. 

41 

37 

Erma J. 

29 

13 

Richard H. 

42 

37 

Ronald L. 

40 

43 

Bonita G. 

24 

14 

Pamela J. 

42 

45 

Rex N. 

43 

41 

Edward B. 

32 

24 

Richard S. 

36 

19 

Rosa L. 

48 

45 

Maurice D. 

39 

18 

Karen M. 

43 

26 

Warren W. 

39 

39 

Roger W. 

33 

16 

Jack G. 

39 

37 

Richard T. 

47 

26 

Stanley L. 

48 

47 


* We wish to express our appreciation to Dr. J. Thomas Hastings, Director of 
the Illinois Statewide Testing Program, who made these data available. 


in length, and the test score was the number of correct answers given. A 
scatter diagram of the bivariate data in Table 7.1 appears in Fig. 7.2. 

The intermediate and final calculations of r„ for both formulas (7.10) 
.Tito 1 (z 1 }} s ppvae.- .w Eahl? 7.2. . 4.V A&tfAiW.tAiif wrar p&ftxFxarA ao js rkwlf 
calculator. (Without mechanical computation, the calculation of product- 
moment correlation coefficients is usually tedious.) Perhaps the only quantity 

40 

in Table 7.2 whose origin is doubtful in your mind is 2 X l Y i . This quantity 

t-l 

is the sum over all persons of the product of each person’s X and Y scores. 
For the first student in Table 7.1, Linda J., X x «= 19 and T, = 17; for the 
second student, Peggy Y., X t = 32 and Y x = 7. The quantity 

X X t Y t « (19 • 17) + (32 • 7) + . . . + (48 • 47) = 40,798. 

The final calculations on the right-hand side of Table 7.2 show r. r to 
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TABLE 7.2 ILLUSTRATION OF THE CALCULATION OF r„ FOR THE 
DATA IN TABLE 7.1 

Intermediate calculations Final calculations 


» = 40 

fjTi = I465 J ft = 1057 

(-1 «=t 

2 ■*? - 55.725 ^ 1? = 32,551 

2 ^^= 40,798 


Equation (7.10): 

40,798 - (1465)(1057)/40 
V 155 ,725 - (1465) l /40J 

X {32,551 - (I057)’/40] 

2085.375 
3091.932 “ °' 67 ' 

Equation (7.11): 

r 40(40,798) - (1465)(1057) 

^ 140(55, 725) - (1465)’J 

X (40(32,551) - (1057)*] 

83,415 


'" h " Eq ,' <7 : 10) M Eq - (7 - n) ' 11 wil1 l» true .hat the 

louets to h. » T a”™ va '”- Wi ' h!n rolmdin S "tor. Thus, there 
slro "8’ d, «? rrtetionship between abstract and verbal 
reasoning ability as measured by the two tests. 


FIG. 7.2 Scatter diagram of tl 


'•> reasoning 

*0 pairs of test scores in Table 7.t. 



7.5 

RANGE OF VALUES OF 


It is somewhat difficult to prove, but r zv can never take on a value less than 
—1 nor a value greater than +1.* (If you are dismayed by the apparent 

TABLE 7.3 INTERPRETATION OF VALUES OF t„ 

Value of Description of linear 

r„ relationship Scatter diagram 


+1.00 Perfect, direct relationship 



About +.50 Moderate, direct relationship 



X 


.00 


About —.50 


No relationship 
(i.e., 0 covariation of X with Y) 


Moderate, inverse relationship 


Perfect, inverse relationship 



* To prove that the value of r„ cannot exceed +i, expand £ — *,)*, which is 

always greater than or equal to zero, and use the fact that i 


2 *2 = 2 zj = » ~ 


2 ** 

(n - 1 / 


To show that r„ cannot be less than —I, work with 2 (*» + 
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TABLE 7.4 


difficulty of the proof suggested in the footnote, you can at least take con- 
solation in the fact that the simpler proof often given in elementary textbooks 
is fallacious.) Table 7.3 is a listing of various values of r tv with illustrations 
of the type of linear relationship that exists between X and Y for the given 
values of r xv * 

In Table 7.4 some representative correlation coefficients are presented. 

TYPICAL VALUES OF r„ 


Descriptions of variables 

X Y 

Nature of 
subjects 

Typical value 
ofr„ 

Iowa Test of Educational 
Development, Grade 9 

Freshman grade-point 
average in college 

Over 600 college 
students 

.58 

The Stanford-Binet 
Intelligence Test IQ 

Thc same test given 
one week later 

Elementary-school 

pupils 

.90 

Verbal reasoning ability 
(as measured by the 
Differential Aptitude 
Test) 

Nonverbal reasoning 
ability 

High-school 

juniors 

.65 

Height 

Achievement in 
college physics 

Male college 
seniors 

.00 


to The W f a r' data ’ you wi " to develop a “feel” 

You will P mdi “ ,ed by P art ‘ cu l ar value of r. 

descriptive adjectives to values of r sudt^n ■ We hesiUte 10 a PP ly 

r of .20 "low.’’ Whether mo™. , ? as . caI,,a E aa ' of .80 “high" or an 

depends upon how the two variabletbrim” " ' hl f‘’”“! ow >” ° r “moderate” 
the past, what use one intense 'f orre,a,ed have been related in 

variables, etc. Moreover whv ° ° f the relationshi P between the 
scription for a value of r when h “so slmnl' a " d ambi S uoas adjectival de- 

Erlenmeyer-Kimling and Jarvik (1963? ore" 1 "? a ? " P ° rt ' h ' value? 

Ittg illustrations oftheuseofr TV. r Ptoented data that are enlighten- 
vlasses of studies published in the^ 'W"* 1 val “« °f r to large 
scores of children were correlated withd!**' Wblcb thc intelligence-test 
” mirelated children. For ' IT ^ ■' b «"g>. related children. 

example, the typ.cal value of Ihe correlation 


•The 

inCtaptcj. For more 
y in this chapter. 


special typ., 1o bo d «*d 
• • oq curvilinear relationships between X 
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OF TRANSFORMING SCORES I 1 9 


coefficient between a child’s IQ (X) and the IQ of his identical twin (Y) for 
a large set of identical-twin pairs was .88 when the twins were reared together. 
The typical correlation between IQ’s of identical twins reared apart was .75. 
These and other data are reported in Table 7.5. 


TABLE 7.5 CORRELATIONS BETWEEN IQ’s OF RELATED OR UNRELATED 
CHILDREN AS A FUNCTION OF GENETIC SIMILARITY AND 
SIMILARITY OF ENVIRONMENT 


Nature of pairing Typical value of r„ 


Identical twins, reared together 

.88 

Identical twins, reared apart 

.75 

Fraternal twins of same sex 

.53 

Fraternal twins of opposite sex 

.53 

Siblings, reared together 

.49 

Siblings, reared apart 

.46 

Parent with own child 

.52 

Foster parent with child 

.19 

Unrelated, reared together 

.16 


7.6 

THE EFFECT ON r„ OF 
TRANSFORMING SCORES 

Often the mean and variance of the scores on X and Y are arbitrary. We 
can change them at will without consequence, it seems. But does the value 
of r w depend on the means and variances of X and Y? The answer is /Vo. 
This answer was implicit in our development of the formula for r xv ; now we 
wish to make it more explicitly. 

The mean and variance of X (or Y) can be changed to any value we 
desire by multiplying X by a constant MO, and adding a constant a to the 
product, i.e., by forming bX + a. This process is called “taking a linear 
transformation of XX Suppose we take another (or perhaps the same) 
linear transformation of Y, dY + c, where d^O. Is the correlation co- 
efficient between X and Y the same as that between bX + a and dY + cl 
The correlation of bX + a and dY + c is the covariance of the two 
divided by the product of the standard deviations. We know that adding 
a constant to a variable does not change its standard deviation and that 
multiplying a variable by a constant multiplies the standard deviation of 
that variable by the absolute value of the constant. Thus, the standard 
deviation of bX+a is Jills',, and dY + c has a standard deviation of 

|rf!v 


3*. r+a = l fc l s *> and s <tr+€ = \ d \ V 


(7.12) 
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The covariance of bX + a and dY + c is 


j [bX, + a- ( bX + a)W, + t — ( dr .+ C >1 
s i»x+«i(ir+«) — n — 1 

This expression reduces to the following: 


i(bX,~6*X<i!'.-‘'S’J MZC*.-W<- PJ 

1 -a — 1 ■ — • = bdSxy. ('-‘-U 

n — 1 » — 1 

In words, the covariance of bX + a and dY + c equals M times the covari- 
ance of X and Y. We can combine the results in Eqs. (7.12) and (7.13) 
into an expression for the correlation of bX + a and dY + c. 


_ bds n bd 

r« + ^r + .- jbjklj ^ = 161|rf| r «- 


(7.14) 


In words, the correlation between bX + a and dY + c equals r„ times the 
product of b and d over the product of the absolute values of b and d. As 
an example, suppose that X is transformed into 3JV + 5 and Y into 27 + 8- 


This particular transformation of X and Y had no effect on the corre- 
lation r„. In fact the ratio of bd to |6| |</|in Eq. (7.14) can never be anything 
but +1 or —1. Hence, no linear transformation of either X or Y (provided 
b ot d is not zero) can change the size of the correlation between X and Y, 
though it may change the sign of the correlation. If either b or d, but not 
both, is negative, the correlation of bX + a and dY + c will equal —r„. 
These results are summarized in Table 7.6. 


TABLE 7.6 THE EFFECT OF LINEAR TRANSFORMATIONS OF X AND Y ON 
THE VALUE OF r„ [SPECIAL CASES OF EQ. (7.14)] 

bX + a dY + e Value of 


6 is positive d is positive 
b is negative d is posi Uve 
b is positive d is negative 
b is negative d is negative 



7.7 

INTERPRETING CORRELATION 
COEFFICIENTS 


A. Causation and Correlation 

The presence of a correlation between two variables does not necessarily 
mean there exists a causal link between them. Even though concomitance 
(correlation) between events can be useful in identifying causal relationships 
when coupled with other methodological approaches, it is a dangerous and 
potentially misleading test for causation when used alone. First, even when 
one can presume that a causal relationship does exist between the two vari- 
ables being correlated, can tell nothing by itself about whether X causes 
Y or Y causes X. Second, often variables other than the two under con- 
sideration are responsible for the observed association. Third, the relation- 
ships that exist among variables in education and the social sciences are 
almost always too complex to be explained in terms of a single cause. 
Achievement in school is the resultant of numerous influences, in addition 
to being a complex concept itself which cannot be described adequately by 
any single measurement. 

We shall examine some examples of the problems that arise in attempts 
to unearth causal relationships with correlational techniques. It is probably 
true that in the United States there is a positive correlation between the average 
salary of teachers in high schools and the percent of the school’s graduates 
who enter college. Does this imply that a well-paid teaching staff cause 
better trained high-school graduates? Would the percent of high-school 
graduates entering college rise if we increased the pay of teachers ? Certainly 
affirmative answers to these questions are not justified by the associationa! 
relationship alone. The relationship between the two factors is not simple, 
but one prominent variable not yet mentioned is the financial and economic 
condition of the community that largely determines its ability to pay both 
teachers’ salaries and college tuitions. Moreover, the economic and financial 
condition of the community is in part dependent upon the intellectual powers 
of its citizens, another variable that contributes to both higher teachers’ 
salaries and greater college attendance among the young people. 

It has been found that the percent of “dropouts” in each of a number of 
high schools is negatively correlated with the number of books per pupil in 
the libraries of those schools. But common sense tells us that piling more 
books into the library will no more affect the dropout rate than hiring a 
better truant officer will bring about a magical increase in the holdings of the 
school library. If only common sense always served us so well ! 

Many researchers do not stop with one fallacious conclusion, i.e., that 
correlation is prima facie evidence for causation, but draw a second one as 
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relationships, not techniques to demonstrate noncausal ones. There are 
only relatively few of the former and they are valuable, but noncausal re- 
lationships exist in superabundance and the discovery of one is little cause 
for celebration. For further discussion, see Blalock (1961) and Campbell 
and Stanley (1963). 

B. Presence of Identifiable Groups 
of Subjects with Different Means 

A substantial correlation between two variables is a fact that can be ex- 
plained differently in different situations. Some correlations result from 
measuring a cause and its effect, e.g., when X is food intake in a period of 
one month and Y is weight gain over the same period. Other correlations 
result from measuring two variables with a common cause or influence, e.g., 
when X is achievement in English and Y is achievement in social studies. 
Still other correlations result when two distinct groups of persons within 
which X and Y are unrelated are poofed together. 

Suppose that girls report greater anxiety than boys on an inventory 
such as Taylor’s Manifest Anxiety Scale. It is well known that girls tend to 
score higher than boys on achievement tests in English, especially in the 
intermediate grades. The scatter diagram of the anxiety and English 
achievemen t scores for 1 5 boys and 15 girls might look like the one in Fig. 7.3. 

The scatter diagram in Fig. 7.3 shows a moderately strong positive 
relationship between anxiety and English achievement when boys’ and girls' 
scores are pooled. Does this mean that anxiety (tension) makes a student 
work harder and thus achieve more? Not at all. For if it did, why would 
one obtain no relationship between the two variables for boys and girls 
separately? 

Figure 7.3 shows that nonzero correlations can result when distinct 
groups, e.g., boys and girls, with different average values on the two variables 
are pooled together. Either positive or negative relationships can result 
from this pooling. Sketch the scatter diagram that would result if one pools 
two groups for which X and Y are uncorrelated and for which group A 


FIG. 7.3 Scatter diagram of 
anxiety and English achievement 




B B 
B 8 B 
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has a high mean on X and a low mean on Y and group B has a low mean on 
X and a high mean on Y. Does the resulting scatter diagram correspond 
to a zero, a positive, or a negative correlation between X and Y1 

The identification of subgroups with differing means on X and K does 
not negate the fact that X and Y correlate. However, it may provide a 
more rational explanation of why r„ is substantially different from zero. 


C. Curvilinearity and the Shapes 
of Marginal Distributions 


Of all the possible ways in which measurements on two variables can be 
related, r„ measures only one type. The value of r„ is a measure of the 
degree of l, near relationship between X and Y. If X and Y are perfectly 
.nearly related, the points in the scatter diagram will fall on a single straight 
in Table 7.3. If we sea, ter the points in such a scaher 
diagram above and below ihe line in a haphazard manner and about the 
™ d,reai0 "i »' obtain various degrees of basically 
“ “T <*»«".**>>■! Y. If the points in a scatter diagram 

elahonsh n h , * ‘ hilp Wd "“"net from a corned line, the 

relationship between X and Y may be basically narf/W. To say that r„ 

sorts cdcurvlinM T? rcl ^°" shi P b «»“n ITand Y means that dilTeren” 

somofcurvdinear relationships belween rand T may produce values of r 

diagram' of M da'm T a ° if in, "P™' d »M>out reference to a scatter 
r S, be zero K v ^ "“"f rch " d ^lllnearl, and yet 

diagrams each of whirh ha . , E ure T4 shows two different scatter 

Howeve tth T coefficient ofapproximalelyzero. 

' even though the scalier diagrams A and B in Fig. 7.4 



Random relationship 
Fla. 7 A Two instances of 


High curvilinear relationship 
approsinutd, zero p mOwn. morat ee„Ulion. 
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FIG. 7.5 The scatter diagram 
of scores for test A (which is too 
easy) and test B (which is too 
difficult for the group tested). 



Test B 


both have correlation coefficients of zero, there is considerable relationship 
present in B, while any systematic relationship between X and Y is lacking 
in A. The single illustration in Fig. 7.4 should be sufficient warning never 
to draw a rash conclusion that two variables are unrelated merely because 
r xy is zero. Various degrees of curvilinearity of relationship between measures 
of variables are not uncommon. Educational and psychological test scores 
frequently show “ceiling” or “cellar” effects with atypical groups of persons, 
i.e., the tests may be too easy or too difficult with the result that many 
persons obtain the highest or the lowest test score. The scatter diagram of 
test scores for test A, which shows a “ceiling effect,” and test B, which shows 
a “cellar effect,” might look like the one in Fig. 7.5. 

The value of r AB for the data in Fig. 7.5 is not large; it is probably 
only about .30. In the range for which both tests are of appropriate difficulty, 
the two tests appear to be more highly related. One suspects that if test A 
was made more difficult and Test B easier without radically affecting the 
content of either test, the value of r AB for these persons would increase. 
The scatter diagram of the test scores for such altered tests would probably 
show less curvilinearity than now. (This example illustrates another im- 
portant point : the degree of relationship obtained between any two variables — 
regardless of how the relationship is expressed — depends on the nature of 
the measurement of the variables. For example, we generally think of the 
characteristics weight and height as being rather closely related in adult 
human beings; but it is not difficult to conceive ways of measuring each 
variable that are so poor— e.g., measurement by the sober subjective judg- 
ments of four-year olds that weight and height scores would show almost no 
correlation.) 


7.8 

FURTHER REMARKS ON THE 
INTERPRETATION OF r,„ 

Carroll (1961) presented a readable account of how the interpretation of r„ 
is dependent upon the shapes of the distributions of X and Y and of their 
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joint distribution. Carroll’s article is an excellent statement of many points 
touched only briefly here and parts of it will be clear to the student 
whose acquaintance with correlation does not extend beyond this and the 
next two chapters. He made the following observation on both the problem 
of interpreting r„ and the statistical training of students: 


“Students are not adequately informed that these limits [ — I to -M] 
and meanings [“highly related,” “moderately related," “not related”] 
strictly have reference to certain statistical models. Two of the most fre- 
quently used models are the normal bivariate surface [see Sec. 6.6] and the 
linear regression model [see Chapter 8] . . .. No assumptions are necessary for 
the computation of a Pearsonian coefficient, but the interpretation of its 
meaning certainly depends upon the extent to which the data conform to an 
appropriate statistical model for making this interpretation. As actual data 
depart from a fit to such a model [e.g., the bivariate normal surface], the 
hm is of the correlation coefficient may contract, and the adjectival inter- 
pretations are less meaningful." J 

rr „„ As " an ? pI ' of hlm thc "’“■mum value or r„ might bound away 
dbM^i,' ,hC a Crr o5" enCy<i ' S,rib "' ionsofArantl rare skewed indifferent 
distribution”' 11 ' " S “ reS X and °° ’ Wilh the foll °' vi "S frequency 


Score on X: 0 1 2 3 4 5 6 7 

Frequency: 2 [ i 2 14 14 ,3 ]0 , 


9 

4 2 I 


Score on Y: 
Frequency: 


123456789 10 
0 I 1 2 2 4 5 6 7 71 


maximum possible linear relabeled,' k. ’ W ' In other words, even if [he 

only about .60. This reflects no weTtaSITr ”a “ 

11 cannot be blamed for not dome what b J” , a . d,;5CTI P" v ' measure; 
that cannot be much larger than 60 n ft ,1 deS, S“ ed to d °- ^ t3a ’ 
be rather comforting. When Y hoc t th above exam P Ic should actually 
h “ ,0 man, value, above i «"ea t° ^ its mean and Y 

of the Y t around F to be associa, *h ,rn P° ss,blc for 2,1 positive deviations 
around JF . fact , Positive devialion. of .he X, 

■be rand, distribution, have identic' Ih"T Va ' UeS ° f +1 ° r “»'«« 

^nd ° n the distributions of 

^Vhat does it reflect? Moderate re » ■ S ” pp °“ ll, alani'„of.60isobrained. 
joint frequency distribution looks b f 1,v ”" two variables whose 

maximum possible reUtionshio between n<,rmal surfaC!r ’ °’ ,he 

positively skewed JTand a negatively 
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skewed Y1 Earlier we saw how similar doubts surround a zero value for r xy . 
Are X and Y actually unrelated or is the relationship between them non- 
linear? The most satisfactory resolution of all of these doubts can be made 
by looking at a scatter diagram of the X and Y scores. From such a diagram 
it can be seen immediately whether or not X and Y have a pronounced 
curvilinear relationship and whether X and Y are markedly skewed. Re- 
grettably, researchers are too hesitant to construct scatter diagrams. One 
would think that one beneficial effect of electronic data processing would be 
increased plotting of scatter diagrams in correlational problems, either by 
hand or by computer. This has not yet happened. In the authors’ judgment, 
building and inspecting the scatter diagram so that r xv can be interpreted 
more intelligently are well worth the little effort they take. 


7.9 

THE VARIANCE OF SUMS AND 
DIFFERENCES OF VARIABLES 


Quite often in education and psychology one wishes to find the variance of 
a group of summed X and Y scores. Moreover, simply inspecting the formula 
that related s| +v , the variance of the summed X and Y scores, to s®, s®, and 
r xv can illuminate the way in which influences combine to produce joint 
effects. In the history of mental test theory, a general expression for the 
variance of a sum of variables has played an important role (a total test 
score is the sum of scores on the individual items of the test). 

The variance of X + Y , where each of the n sums is X t + Y ( , has the 
following definition: 

ilX,+ Y,-(X+ F)I* 

• <7I5) 

The terms inside the brackets in Eq. (7.15) can be rearranged to produce 


ikx- *:>+«:- nr 


(7.16) 


If the bracketed expression in the numerator of Eq. (7.16) is expanded 
and the summation sign is distributed over the terms after expansion, one 
obtains 



You will recognize immediately that the first and last terms on the 
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right-hand side of Eq. (7.17) are and s;, respectively. The middle term is 
simply 2 times the covariance of X and Y, s n . Thus, 

^=sl + sl + 2s„. (7.18) 

One way of denoting r„ is by j„/(v,). Obviously then, s„ = 
Therefore, replacing s tv by an equivalent expression gives 

4, “ 4 + s* + 2r^ r (7.19) 

Equations (7.18) and (7.19) relate the variance of the sum of two arrays 
of scores to the variance of each array and the covariance of the arrays. 

An important special case of Eq. (7.19) is that in which X and Y are 
uncorrelated, i.e., r„ = 0. If this is true, then 

4+» = 4 4- 4- (7.20) 

What Is an expression equivalent to ? 

, i(( X,-Y t )-{X- F)f 

s* 

n — 1 


ry-ii (x, - xxr, - rj 


= i t+4-2=„ = ! ; + ! :-2 W ,. 

, we ™?„" L n r , ~ r S ‘ i " g . deVe ' 0pm ' m - The varia ”« ° f > h ' differences be- 
he varianee rf ^r , “°T ° n X ^ Y '1 ua,s ,he ™ a "“ »' */>'«> 

a ““ . ^ **'“ lh ' “variance of X and y (or twice r times 

r.ands,). Again, if rand Karc uncorrelated, then " 

(7.21) 


. = 44- s* 


You may find it difficult to reconcile Eq. (7.20) with Eo n 711 nnd this 

,S " ^TandTare u haPS T** ^-tlilfh^ ( > *“ 

,htn s “"'^=nd -rare oncorreiated, 
send f„, -?“« t” t' «)• >• 

correlated. The effect of * 4- V because Y and Y* are un- 

of -F bc,ta,:sTheSS E /V 1 r' Or0nn 
course. A" + f isjust r ” Henle, "“4°^ - “ *5 + * b «< ° f 

can be illuminating also. ^ For example^'we'kno'w^m S * a, ' stlcs ^eF 


* + *: + iv.v - j,‘ - 


1 2r„s,V 
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Therefore, if we divide the above equation through by 2s^ v , we find that 

sl +v -si -si 

= r_ B . 

2 s x s y 

Suppose one has three variables X, Y, and Z. What would be the vari- 
ance of the sums formed by adding together a person’s scores on these three 
variables to secure the sum X t + Y t + Z<? 

i [(*, + Y, + Z ,) - (X + T + Z)f 
it. = : • (7.22) 

The terms in brackets in Eq. (7.22) can be rearranged to yield 

i l(X, - X} + (Y, - F.) + (Z, - Z) ]* 

= - 1 — — — : • (7-23) 


The numerator of Eq. (7.23) is a trinomial squared. You may recall 
from high-school algebra that (a 4- b + e) 1 = a 3 -f b* + c* -J- 2ab + lac -f 
2 be. Hence, 


'ZiX'-Xf S(Y«- F)» 
n — 1 n - 1 


2(z < ~g.) t 2KX,- J)C^- F> 

+ n-1 « — I 

2 £ (X, - Y)(2, -2) ; 2Z(Y, - F)(Z, - Z,) 
n — 1 n — 1 

All of the terms to the right of the equal sign in the above equation are 
variances or covariances. The entire expression can be reduced to 


- 4 + 4 + 4 + + Zfc -f- 2^ 


which is the same as 

sL. + . = + s .’ + 2 W. + 2 w. + 2 W- 

The problems of variances of sums of and differences between variables 
are very important in intermediate and advanced statistics. A thorough 
understanding of these concepts is requisite to most work in mental test 
theory, factor analysis, and many other areas which lie outside of statistics. 
You would do well to thoroughly understand the material in this section 
before moving on. U you need further instruction in this area, Edwards 
(1964, pp. 15-23) should be helpful. 



7.10 

OTHER MATERIAL ON 
CORRELATION 

This chapter does not contain the entire treatment of correlation in this 
text. Chapter 8 has as its subject the problem of least-squares estimation, 
which is closely related to correlation (see Sec. 8.4). In Chapter 9, correlation 
coefficients are presented for correlating scores that are nominal and ordinal. 
These three chapters by no means exhaust the subject of measuring relation- 
ships. Those aspects of the subject which have been slighted here can be 
covered adequately by consulting Ezekiel and Fox (1963), DuBois (1957), 
and Kruskal (1958). 


PROBLEMS AND EXERCISES 

1. Prove that the correlation of AT and Y equals +1 when r x - Zlr 

^Ls assumed i** 1 r ” = 2 z **»/(" - 1). Since a person's z score on 

•r ts assumed , 0 be identtcal to h,s a score on Y. subs.itutV r, fur z, in the 

formula fo, r„. Then prove that ^ - 1) - + 1 . See Prob. 8 in Chapter 

2 ' ^oX^He'tta' ° r hci s ht i" inches X and running speed in 

r,t^he H ;:, b ^ 

omo f' tKe 

x ssrss: ° f - - - 

of all elementary -school £pils i„,te tKuedSu.m ” P°P"' a, ' on 

a. X heigh, in inches; y. weigh, in d , 

*>- x , age m months between 6 and 16 year*- v • 

run 50 yards. * rs * *• ,lmc ,n seconds required to 

" X, arithmetic achievement 

by hi?ieache” d ' n1 ' K ' c,,, “ mhi P" ™ing of student on a 10-point scale 

absent from schoo'i'duTing the ^^ c "P Iaccmcnt ™its; Y, number of days 

4. For a particular set of data * & 

-id possibly be? „ lm , : ^ J is the iarges, tbat 

5. The correlation of X with Y is 60- WCW I 

more closely I, nearly related to V or to |° rreIat,on of X wit »> 2 « -.80. Is X 
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6. A researcher demonstrated a correlation of -.52 between average teacher 
salary X and the proportion of students who drop out of school before 
graduation Y, across 120 high schools in his state. He concluded that increasing 
teachers* salaries would reduce the “drop-out rate.” Comment on his 
conclusion. 


7. a. Find the value of the correlation coefficient r for the following data: 


Person 

X 

Y 

1 

100 

28 

2 

90 

25 

3 

126 

19 

4 

112 

24 

5 

80 

23 

6 

115 

21 

7 

105 

27 

8 

110 

25 

9 

99 

26 

10 

97 

25 

11 

87 

23 

12 

76 

18 

13 

100 

29 

14 

80 

20 

15 

120 

18 


b. Plot a scatter diagram for the above data. 

c. Does the relationship between X and Y— if there is a nonzero one— appear 
to be predominantly linear or curvilinear? 

8. Compute the r for data set a, below, and the r for data set b. Why do the r’s 
differ in magnitude? 


a. Person IQ 

number Test A Test B 


b. Person Test score Arith. 
number gen. vocab. reasoning 


1 80 83 

2 105 101 

3 121 117 

4 93 100 

5 99 96 

6 107 112 

7 119 123 

8 103 99 

9 102 1 10 

10 115 11° 

11 87 81 

12 96 98 


1 96 104 

2 111 121 

3 89 84 

4 107 91 

5 102 114 

6 115 96 

7 98 109 

8 83 94 

9 104 116 

10 100 86 

11 117 101 

12 94 99 


9 . 


It has been shown that women tend to score much higher (i.c. have more 
•'accepting'' attitudes) than men on the Minnesota Teacher Attitude Inventory. 
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A researcher correlated the MTAI scores of a group of 100 experienced 
secondary teachers with the number of students each teacher failed in a year. 
He obtained an r of -.39. He concluded that teachers tend to fail students 
because they do not have “accepting” attitudes toward students. Comment 
on the researcher’s methods and conclusions. 

10. i (Xf - X){Y t - 7) = i IXAY, -7)- X(Y { - Fj] 

-Xw, F) -f ur, - r> - x.m 

In words, the sum of the cross products of the X and Y scores, both in deviation 
score form, equals the sum of the cross products of X, not in deviation score 
form, and the Y's in deviation score form. This seems paradoxical. Has a 
mistake been made in the proof? Or is it truly a fact? 



8 


PREDICTION 

AND 

ESTIMATION 


8.1 

PRELIMINARIES 


. nrnredures involved in the simplest form of statistical 

^•“"^ts.Pi.Jrated by means of a few elementary notions in 
pred.ct.on are best .UusM ^ ^ ^ ^ ^ ;dea of , t 

dimensional Coordinate system and the equation for a straight l.ne « th.s 

depth) into four quadrants- . . . ^ that evcry point in the plane 

means of marking off the P 1 ™ ' L" C “r w W The point (0, 0) is called 
can be identified by a pan °< ^ ' whcre , hc *and r lines cross, 

the origin of the system a first number of any pair is the distance 

i.e., where the two axes intersect. 


. Named .her ,he French Rend Descane,. 
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one must travel horizontally from the origin (ihe X dislance) lo reach the 
point, and Ihe second number is Ihe dislance Ihe point lies tnticaUy from the 

The hr..™' IT- "1, F ' S ' 8,1 corrcs POhds to Ihe pair of numbers (2, 2). 
The r„st number is called Ihe X-coorJinatc: Ihe second number is called 
e Y-eoordmate. The point B corresponds to Ihe pair of numbers (-2, 1)1 
. or ai°„ T'T'?" ° f ’"o' ° ri8i " a, °"8 ,ht X™ “"<« - un • a ove 

numto bo h fr th h 3 ' 15 ' P ° in,S in 1 uadra "' 1 •» pai" o' 

numbers both of which are positive; points in quadrant II corrcsDond lo 

" WhM "ra'lfe'sle 1 ° r r hi h h “ "' Sa,iV '- °f which is positive, 

are the signs of the numbers describing points in the third 
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quadrant? The fourth quadrant? What is the pair of numbers that 
determine the location of point Cin Fig. 8.1 ? 

You should realize that any point in a plane corresponds to a pair of 
real numbers and that any pair of real numbers identifies a point. Thus, 
(—100, 3.67) determines a point, as does (—41.65, 214.6). 

Eventually, we shall arrive at a method of predicting a set of scores 
that uses a straight line in a plane to describe the set of predicted scores. 
It will be useful to know the manner in which any straight line in a plane can 
be completely described by a simple equation. 

In Fig. 8.2 the straight line L crosses the Y axis at the point (0, 1), 
and the X axis at the point (—0.5, 0). For each unit on the X axis that the 
moving point which describes the line moves to the right, it rises two units 
on the Y axis. The following points lie on the line L : (—2, —4), (— 1 , — 1), 
(0, I), (1, 3), etc. The value of Y in the description of the point (X, Y) is 
systematically related to the value of X. For the line L in Fig. 8.2, the Y 
value of any point on the line equals twice the X value plus I , i.e., Y— 2X + 

1. The equation Y ~ 2X + 1 is the equation for the straight line in Fig. 8.2. 
The number 1 is called the Y intercept because it is the distance above the 
Yaxis at which the line intersects with the Y axis. The number 2 is the slope 
of the straight line. The slope is the number of units the line rises for each 
unit of movement to the right on the X axis, here 2:1. 

The equation Y = b Y X 4- b 0 is called the "general equation for a straight 
tine." It says simply that the pairs of points (X, Y) that lie on any straight 
line are related in such a way that for any X value the Y value paired with 
it can be found by multiplying X by some number b u and adding a second 
number b 0 to this product. This is a linear transformation of X to secure Y; 
hj is a multiplicative constant, the same for every X, and b 0 is an additive 
constant, the same for every X. 


8.2 

THE PROBLEM OF 
ESTIMATING Y FROM X 
(OR X FROM Y ) 


Given an individual's score on characteristic (variable) X, what information 
can be gained about his score on characteristic (variable) Y1 Some examples 
of the estimation problem are:* 

I. How, and how well, can we predict college English grades from 
bigh-school English grades? (The high-school grades precede the 
college grades, so we can predict the latter.) 

* When X precedes the Y we wish to estimate from it, then we predict (i e., tell in 
advance) Y from X, once we know- the relationship between X and Y based on an earlier 
group. 
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2. How well can we estimate Stanford-Binel IQ scores from California 
Test of Mental Maturity IQ scores? (No antecedent-consequent 
order is implied, so this is estimation to determine how nearly 
equivalent the z scores of each examinee are on the two tests.) 

3. How well can we predict income at age 35 from rank in high-school 
graduating class? 

4. How well can we estimate achievement from intelligence? 


To derive a means of estimating the score of a person on one variable 
(which we shall denote by Y) from a different variable (X), we must know 
how X and Y are related. The variable we wish to estimate is called the 
dependent variable O'), and the variable that will be used to estimate it is 
the independent variable (AT). For example, we might wish to predict achieve- 
ment in ninth-grade mathematics (dependent variable Y) from a group 
intelligence test given at the end of grade eight (independent variable X). 
M ““ re ‘ ° f ym ' s r hl be sc ° ^e, ° n a 5 °-' ,em achievement test in ninth-grade 
Ws m “ !t fir ” ellh " dala 0,> some nilmber " ° r students whose 
math w? we test in grade eight and whose achievement in ninth-grade 
Thi erotT u' H'" e!,ab,isb “ 'I”' 1 '’" ' ha ' 'elutes X and V in 

whose sen,: I rwJ c ') ua,iM m lh ' future with students 

lUusnadve X i , k "°* and whwc sco 'e on X we would like to estimate. 
Fie IH " * “? b ' 'ahulated as in Table 8.1. and graphed as in 
(there it - 201 8 '' lhal fiv ' s “ ms a " d ,hc "“mher of pairs 

=ffe donli^n ‘.° d "" m ™ "hieh is the slope of the straight 

X%q'Z rnde 8? Y f ma, hematics scores in grade 9) from 

2165, the sum of the 20 Y scores fR?4t- 

Which is X\ + XI + 4 - yi - -Sic the sum of ‘he squared X scores, 

scores (34,442)- and the*«nm Ir ,u ™ *’ Sum l ^ e ^0 squared Y 
paired Y scores, which is C P roduc,s of the X scores and their 


XiY x + 


■ + X n Y n -. 


1 95(33) + 100(31) + . . . + 118(48) 
89,715. 


h. - -35.441. Thus, the equation? °"' ‘ hal 4 ‘ “ - 708 - and lhaI 
(—35.441) = .70835 — 35 44*1 Th °! «*>mating T’s r rom x~s is .703A" + 
is shown in Fig. 8.3. It h a . . l,ne that follows this formula 


i means that if , 7T “ .70S. 

- - ay back lo X = n In ® J draWn wiUl Ihe x “ is e»- 

at the X of 0 crosses the Y axis at ^' 3 sUI° U,d s “ that tlic regression line 
(0. -35.441). al —35.441 , the cartesian pair there being 
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TABLE B.{ DATA FOR THE DETERMINATION OF A PREDICTION LINE 
X Y 

Independent Dependent 

variable variable 

(IQ in grade 8) (math score in grade 9) Calculations 


33 JAT=2165 TT=824 

31 X 

35 X X* = 235,091 Y Y* = 34 442 

38 

41 ]?XY^ 89,715 


95 

100 

100 

102 

103 


105 

106 
106 
106 
109 


110 

no 

in 

112 

112 


114 

114 

115 

117 

118 


37 

37 

39 
43 

40 


41 

44 
40 

45 
48 


45 

49 

47 
43 

48 


* = 20 


b x - 


"2xr~ZxZr 


20(89,715) - (2 165) (824) 10,340 

20(235,091) - (2165)’ ~ 14,595 “ ' ?08 ' 

i.-r.-tjr-™-.™/™!]- 

20 \ 20 / 

-35.441 

Least-squares prediction line: 

Y - .708X - 35.441 


Other descriptive statistics: 

J, = 6.198, s, ~ 5.095, s If - 27.211 
r,, •= .861 


What criteria led to the formulas for b x and b 0 used in Table 8.1 ? This 
is an important question that deserves a detailed answer, as given below. 

Suppose we found an equation for predicting Y from X that had satis- 
factory properties. We would have two constants, b x and b 0 , such that 
multiplying X by b x and adding b 0 would give us an estimated value of Y. 
That is, 

Y ( = b t X ( + b„, 

the predicted value of Y for the ith person, denoted by Y it equals b t times 
his X score plus b 0 . Obviously, Y { will not always equal Y t , i.e., even with 
the “best" straight-line prediction equation we shall usually make errors in 
predicting Y from X. We say, then, that Y ( = b x X x -f b 0 -f e t , where e, 
is an error of estimating Y from Y for the ith person: 

e,= r,- (8.1) 

Another name for e, is the error of estimate. The nature of the error of 
estimate in the prediction problem is illustrated in Fig. 8.4 for a person 
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Y 



5 * (IQ in S' 311 ' 8 > " d r (mathematics 


o«Z region to. “ ** M ° W ,hc K,ima ' td ^ for him "« 

founfblctot^tThaf °° 5inE ** a " d ‘‘ 7 U ^' *■ ^ >° a " 


2 ^ -=<•? + <■; + . 

ST “ p0S5ib "- |ha ' <*. a* of the squmd errors or eslimate is 

ex^SIbfoSptcSlTl 8 , *° ° nd ? atbit ^ “ ““ 

Moreover, it is preferred n« * l- com P utat ionally it is convenient. 

is P^d^TTeToLfS ifr * 1 ~ ta ‘ *“ ^ “ 

considerably more statistics. However 'y°™. u,ltl1 you have learned 
minimal sum of squared error* n r M .- \ • critcnon of least squares (i.e., 
Although there has never been a « '™ 3 6 1S not l ^ e onI y criterion possible, 
criterion f „r the Wilh il fm “»'<»• ° f 
and b should he chosen stfthat the “f “»• One is that b, 

tn predict, on i, as small as possible i e soth»H° 'i‘°i“"’ val ““ of ' rrors m3<1 ' 
This criterion leads to a “median • !• Cl + ■ •■ + |e„| is minimized, 

median of a group of scoreT, s t he nn' ? I,nc * (You wi » «call that the 
deviations of the scores from tha^" ?™^ Whlch the sum of the absolute 
regression line is easily calculated wT m,nimal -> Although the median- 
superstructure that the least-sonars d ° CS ”° l have the inferent ial theoretical 
squares regression line has. 
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The use of the criterion of least squares for establishing a prediction 
line is over 150 years old. Karl Gauss (1777-1855), a famed German 
mathematician, physicist, and astronomer, is generally credited with having 
invented the criterion of least squares. In one form or another it underlies 
a large portion of theoretical and applied statistical work. 

We noted earlier that e t is the difference between a person’s actual 
Y score and the Y score we predict for him: 

e, = Y, - t, = Y, - + b„). (8.2) 

We choose by and b 0 so that 

[Yy - (Mi + K)f + \ Y 2 - (Mi + b 0 )? + ... + {Y n - (M„ + ho)) 2 
is as small as it can possibly be. The exact manner in which one determines 
the values of by and 6 0 that minimize the above quantity is too complicated 
for us to detail. (See McNemar, 1962, pp 1 19—24.) We shall simply report 
the results here, and in Sec. 8.5 present a verification of the reported solutions, 
by is given by the following equation: 

■ (8 - 3) 


Y 



FIG. 8.4 Illustration of the error of estimate,*, for the person scoring 100 
on variable X and 31 on variable Y. 
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b 0 is given by the equation 

b 0 = F — b x X , . 


( 8 . 4 ) 


The values of and b 0 found in this way give the “best” prediction 
equation in the sense that the sum of the squared differences between the 
Y t and the = byX, + is as small as it can possibly be for these data. 
In the last section in this chapter a simple algebraic verification that this is in 
fact true is presented. 

Suppose that we seek the “best” equation for estimating Y from X by 
means of a straight line best fitting (in the least-squares sense) the dots of 
the scatter diagram and that we have the X and }' scores for 20 persons as 
given in Table 8.1. 


Nothing has been said to this point about whether the dependent and 
independent variables are normally distributed or distributed in any other 
special way. No knowledge of the shapes of the frequency distributions of 
X and Y was needed to derive the least-squares regression coefficients b 0 
and b v Equations (8.3) and (8.4) for b 0 and b x produce the straight line 
that minimizes the sum of squared residuals regardless of the nature of the 
scatter diagram of the X and 1' scores. 


numh WC Tv 6 S 2 m v p,ausiblc assumptions about the distributions of large 
nZ™ , fK ' "* are '*"»■ b ' rewarded by being able in 

perform a more penetratmg prediclion study. Our study does gain much, 

L eV n rh “ v ”^re"oW&,rL,ion(s=; 

^random samnl 7 ' ' h ' " pair! J'^ores in hand form 

bi™trnSdu,r„'i' arE ' co,kc,ion ° r * Mj ,h3 ' a 

are: ^ P ' 0I,etlics of "* biva '«« normal distribution now of importance 


straight line;° nmCanSOf,hC ^ —each separate value of A ... 

2 ' distributed^’' 11 Va ' U ' 0f * assoc iated Y scores are normally 

*” d SvaSn'e vad ““ 

biv.i r »,?n^' d “„lU hC " are from a 

straight line for predicting yf rom 1 above tells us that the use of a 

upon by any curved line. Properties V rCaS ° nable and “nnot be improved 
a very useful technique that adds ab ° VC be combincd into 

estimation problems. This technique wiu be ^ 



8.3 

HOMOSCEDAST1CITY AND 
THE STANDARD ERROR OF 

estimate 


Property 3 of the bivariate normal distribution leads us to believe that, if 
we have, say, 19 persons with a score of 75 on X and 21 persons with a 
score of SO on X, the variances of the two groups of associated Y scores 
should be about the same. This condition of equal variance of Y scores 
for each value of X is known as homoscedasiicity (the roots of this word 
mean equal spread). The scatter diagram in Fig. 8.5 should help you gain 
an understanding of this condition. 

It is important to note that homoscedasiicity is a property of very large 
bodies of bivariate data. One should not expect equality of variances of Y 
scores for any two values of X when the it's are small, say of the order of 
100 or less. For n’s of 19 and 21 the variances of the Y scores for X — 75 
and X — 80 in Fig. 8.5 are = 5.54 and s\ 8o =* 6.85. These two 
variances are not equal, but they are reasonably close. With such small 
numbers of persons we cannot ascertain very well whether the condition 
of homoscedasticity is satisfied, but at least it seems somewhat plausible 
for A' equals 75 and 80 after inspection of the data of Fig. 8.5. 

Obviously the sizes of the errors made in estimating Y from X are an 
indication of the accuracy of estimation. For the data in hand, i.e., the n 
pairs of X and Y scores, the differences between the actual Y scores and the 
predicted Y scores are measures of the errors that would result if X is used 
to estimate Y. These errors are called errors of estimate. The formula for 
the error of estimate for the fth person is 

e i =Y i -? i ^Y i - b t X, - b 0 . (8.5) 


FIG. 8.5 Scatter diagram for 
19 persons scoring 75 on X and 
21 persons scoring 80 on AT which 
exhibits nearly the same variance 
of Y scores for both values of X. 
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In words, the rath of the variance of the predicted Y scores, to the actual 

Y scores is equal to the square of the correlation coefficient between X and Y. 
Notice that this ratio tells nothing about the direction of the relationship; 
the ratio is never negative. For the data in Table 8.1, s\ — 25.958 and 
s t ~ 19.243. The ratio 19.243/25.958 equals .741, which corresponds to 
the square of the value of r„ for these same data. It is often said in an 
attempt to explain the meaning of r„ that r* ? is the ‘'amount of variance in 

Y explained by variance in X." This is ambiguous and nearly meaningless 
language without explicit definitions of what it means for a variable to have 
its "variance explained,” and yet explicit definitions are usually lacking. 
Such explanations” of r„ are attempts to verbalize Eq. (8.17) beyond the 
italicized sentence above. 


The most significant formula for "understanding” what the correlation 
coeta, desenb« ,s Eq. (8.17, . We shall tty interpret the waning 
of rl, m a slightly different way. V 

no Wa ""? prcdicl cai:b person's score on variable Y bu! 

of the souar A * r b C ‘ ”^ c predicted score that minimizes the sum 
each oe Ton XT? °. f p,edic,i °" ia Pricing P happens to be F for 
about how i'nH^n X variable is availabl ' a " d »°>bing is known 

edic, „a ,^ , : “ a ' ! ™ r - !>'« prediction can be Attained by 

Sness^lh W ' ' SCOre a ' ' h ' mta ” T - r. A measure of the 
goodness of such predtetton without an X variable is given by 


2cn- ?.f 


which happens to equal jJ. 

predicting hisTscore.' ThTsu JoAhe PtrS ° n S slatus 0,1 x is available for 


predicting Y from X with 


squared errors of prediction made in 


.. . l v.vw vt ptulliUUUUI 

me least-squares regression line is given by 

£(n- ?<)* 

which is j*. n ~ I 

Now we know that si + = s t -n, 

m prediction before knowledge of ,S thc crror m: 

knowledge of ris used. The amount nf *S ,S thc crror made wf 

of X must then equal j*. Recall that e uninate ^ *" rom s \ by knowlei 
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Noting that = r| v> the above equation can be written as: 



The following interpretation of r; v can be extracted from Eq. (8.18): 


The value r% equals the proportion of the variance of 
souared errors, j* made in predicting Y without knowledge of 
X that is eliminated when Y is predicted by the least-squares 
method from a knowledge of X. 


. .2 _ 7 S and j* = 100. We know, then, that 

:SST^d inpredicting r widmu. 

knowledge of X, in whieh case everyone is pred.cted to have a Y 
Y , is equal to 


jffl- ry 

i 

n — 1 


= s? = 100. 


The total error made in predicting T from X with the least-squares regression 
line is 

i(y,- ?.>’ 


n — 1 


- = s,* = 25. 


Therefore, 100 - 25 = 75 is the percent of error e.intinated from 4 by 
knowledge of X We note also that r„ ■ ■ 


8.5 

VERIFICATION THAT THE 
LEAST-SQUARES CRITERION IS 
SATISFIED BY b, AND b, 

, A in Annendix C because it will probably 
This algebraic derivation ts placed » without proo f or vert- 

not be of interest to all “ vllu « „f i, and b. that nuntmtae 

fication that Eqs. (S.3) and (8.4) give of estimate. Appendix 

2 IY, - (b. + WP' *' ,c “"'rl veritotion of the fact that Eqs. (8.3) 
c gives a relatively simple alg u est possible sum of squared errors 

and (8.4) for b , and 6, pt° du “ Jl s , ration will satisfy some curious minds 
of estimate. Hopefully, this ‘ * djng 0 f the least-squares entmon » “IJ- 
while imparting a deeper by whieh Eqs. (8.3) and (8.4) 

We emphasize that the proot is m 
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were originally derived. Our proof is an after- the* fact verification, given the 
Eqs. ( 8 . 3 ) and ( 8 . 4 ) for b 9 and 6,. 


8.6 

MEASURING NONLINEAR 
RELATIONSHIPS BETWEEN 
VARIABLES; THE 
CORRELATION RATIO tj* 


Tins section appears here for the sake of completeness and logical continuity. 
You may find it more comprehensible after having studied Chapter 15 on 
the one-factor analysis of variance. 

Although we have repeatedly pointed out that the Pearson product- 
r ™ asur “ on| y the degree Of linear relationship between X and Y. 
10 » “dicate a descriptive measure to use when the relationship 
linear Trial'/?" v ,s predominantly nonlinear. As an example of a non- 

I' to performance 

The data d-? ^ ° f the WcchsIcr AduIt Intelligence Scale. 

Tlie data depicted m Fig. 8.7 are tabulated in Table 8 . 2 . It is obvious from 


TABLE 8.2 


WXStSES^E'Sb^ 0f » ««»« 


7 8 

8 9 

9 10 

9 11 

10 

Age-group 

9 

10 

11 

12 

11 

11 

12 

12 

9 

10 

11 

8 

9 

9 

10 

7 

9 

10 

8 

8.60 9.50 

ToTsb 

11.50 

Toco 

9.00 

8^67 

£66 


Grand rr, 

tean of all 

'cores-™ 

28 

= 9 . 6 t 




lo to a peak atap°22and s,tai S ht - |i '>' fashion from age 
A measure rff the ^ "‘W- 
denoted by ,• ( read .. tla °” , “ 1 ' ar " attonship between X and ris 

correlation ratio has the following d^^fo™' 
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FIG. 8. 7 Relationship between age and performance of 28 persons on the 
digit symbol subtest of the WAIS. 


where SS total = £ (Y, — F)*, he., the sum of squared deviations of each 

i»i 

Y score from the mean of all n Y scores, and 
SS mthin is obtained in the following manner. 

For the first value attained by X, the corresponding Y scores are deviated 
around their own mean and the sum of the squared deviations is calculated. 
For example, in Table 8.2 this first sum of squares is (7 — 8.60)* + (8 — 
8.60)* + (9 — 8.60)* + (9 — 8.60)* + (10 — 8.60)*. This process is re- 
peated for each distinct value that X assumes. For example, for X = 14, 
one calculates (8 - 9.50)* + (9 - 9.50)* + (10 - 9.50)* + (11 - 9.50)*. 
For the last group, X = 38, the sum of the squared deviations of the Y 
“scores” around their mean is (8 — 8)* = 0, since there is only one score. 
Finally, these sums of squared deviations for the separate values of X are 
summed. The result is SS V . (If you are reading this section after having 
read Chapter 15, it will help you to note that SS wimn is the “ sum of squares 
within” for a one-factor analysis of variance with unequal ns.) 

For the data in Table 8.2, the value of SS Mat is 54.68 and the value of 
SS ufmn is 24.87. Hence the value of x is 


n \ , = 1 - = 1 - .455 = .545. 

AA AS 


The following considerations bear on the interpretation of The 
coefficient »j* z — notice that Y precedes the comma and X follows it — is a 
measure of (he extent to which Y is predictable from A' by a “best-fitting” 
line that may be either straight or curved. 

It is important to note that r)\ x and g* r will generally have different 
values. This is contrary to our experiences with r, for which r tw = r tx . We 
can give the fact that *j* >x may not equal some intuitive appeal with the 
data in Table 8.2. If a person’s age is 10, his Digit Symbol score can be 
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fairly confidently predicted to be about 8.60. However, if we know that a 
person’s Y score is 8, his age X may either be low, around 10, or high, 
around 38. Hence, Y can be predicted from X reasonably well; but A'can- 
not be predicted well from Y. These facts are reflected in the values of 
via = -545 and »}* >t which we have not calculated but which is close to 
zero. 

The value of should be compared with the value r* v instead of r„- 
We saw that r* F = 1 — (j */sJ) , which is equal to 


- 1 - . (8.20) 

2(y t - F) s 


Equation (8.20) shows r*, to be (I minus the sum of squared deviations 
of the Y scores around a straight prediction line) divided by (n - !)i*. 
Equation (8.19) shows i£, to be (1 minus the sum of squared deviations 
®;' T y ®f or ” arou L nd a f "™* Prediction line that passes through the mean 

ll'/r r ™'" e 0f divldad ^ (”-1)*;. The curvilinear 
prediction I, ne for predicting Y front X appears in Fig. 8.7. 

As with r’„ ijJj must always be less than or equal to I and greater 
than or equal to 0. Furthermore, . The difference n> - r> is 

rTrom"? °[£ a®" °J n ° n ' in .' ari ‘y ° r a hest-fitting line to predating 
1 trom X. (See Glass and Hakstian, 1969,) 


ADDITIONAL READING ON 
PREDICTION 

pursue P the study "rf statical predS^ j tUdcnt who wishes to 

Koztboom's Foundations of fan b *? ro . nd this text is William 
The Dorsey Press 19661 Th PreJictl0n (Homewood, Illinois: 

catrers can" bepurtoed in ™ i0 a " d " la,ed 

Analysis of Variance (1958) rd S ^ ntTac ^ as ^ Correlation and the 


problems and exercises 


'■ Tr^T^'™*"***-***' lowing equations: 
b - Y = i - 1/3AT. 

d. yl 2(1 D [ vide bo,h «des by 2.) 

7 Who 1 ‘"'"ti Multiply rh c right side out.) 

• What are ihe values of the V ; n . 

'hat passes through ihe points “ d , “» s'°pe of , he straight line 

* 1 = 2 ) and (X = 3, Y — j)? 



PROBLEMS AND EXERCISES 


3 A particular high school has determined a prediction equation for predicting 
' college grade-point-average at the state university from high-school grade- 
point-average. The equation is t- -76 + -62X Predict the college gr.de- 
point-average which would be earned by students with the following high-school 
grade-point-averages : 

a. 3.50 b. 1.68 c. 2.10 d. 4.00 
4. The following are arithmetic test scores and final-exam scores for 12 students 
in an elementary statistics course: 

Student number X, Arithmetic tea Y, Final exam 


12 

Find b 0 and ft, in ' l^whS” rithmetic test score was 36. 

score was “• wha ' " thc mor or 

estimate?) . . _ . . 

5. Find the value of the standard error of estimate , for the data *ob • 

. a „ In Table 8 I, calculate the error of estimate. Y - T, for that 
*• persor^whose X scorers M and whose Fscore is 35. 

* . li-_ p 6 4X + 32.5, and the standard error 

7. In a particular P ^ x and r havc a bivariate normal 

of estimate equals • Within which lie the middle 50/ of the Y 

^apo”:r;”omlfo'se score on , is 10, i.e., determine Ft 
and Y, in the following figute. 
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Partial, part, and multiple correlation will be discussed briefly in the 
latter sections of this chapter. 


9.2 

AN OVERVIEW OF THE 
CHAPTER 


Four types of measurement of variables will be distinguished: 

l N .r. jMiCh °r?r r 2su^ ' ml!n, • T 11 ' presence nr absence of 
r !?d 8 | h la °' S and 1V The ord ' r » f snoring is 

fchoSm nS P'V R 'P“ W! «n (I) -Democrat (0); sibling in 
mSd (0) 5 ,n !Cho ° l <0); ma,e (l) - f "”»" <°)i ™™d (1) - not 

tio^uistttedT," 5 t-* with ■"’ d ' H ' in « normi1 ^"'hu. 

rsrh^ 

or fell below (call this 0) “he -s 3 SCO,e ° f 120 (caU M$ » 

an underlying normal distribution nr° * W ° U ? b * d,chotom >«d data with 
to discard original scores on TLZT*’ Z ^ W0Uld * 5ncfficient 
is not normally done (alihni£h • ,. and rccord 1 s and °’ s instead, and this 
computationally convenient) 8 Generali ° f faCt ° r anal > sis il was 
developed measuring ° nC aSS . UmCS that 3 more 

though the device in hand allows oidy d icho 1 n0rma,l)r d,5 'r'butcd scores even 
IQ above 100 (1), below 100 fm- ^ d,cholomous observation. Examples: 
height (0). (0) * above avcra g e height (I), below average 

"! T" — - >" d -M — 

sort of observations (as when the rau, nvcrt cd measures from some other 
or they may be the «rs, ‘ 24 ’ 97 *-W >• *. fl. 

judge ranks 10 contestants from mn ,t pei _ ccp,,ons ,nt ° numbers (as when a 
10). Example: the 94 mern^s TSFT™' 1 * d ° Wn t0 ,east I™***. 
94 on the basis of their 94 gmde-poinf™.”'^ Ctas are ranksd from 1 10 

on the centigrade'scalejnch.'day 1 etc “ ists ' **• de E rcc 

a zero point on a scale of meatnr^ . tbc case of rat, ° measurement) 
zero amount) of the variable beinp corresponds to the absence (i.e., 
g easured. Any real number may result 
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from the act of measurement, and differences between scores reflect on the 
differences in amount of the characteristic possessed. In this chapter we 
shall generally regard the interval and ratio scores to be approximately 
normally distributed, though of course in some situations they may not be. 
Examples: height; intelligence-test scores; achievement-test scores; certain 
measures in psychological experiments. 

If measurement can be accomplished at the interval- or ratio-scale level, 
the scores can be transformed into any of the other three levels above. For 
example, suppose that ten students earned scores on a test of verbal reasoning, 
whichis believed to producean approximately normal distribution of measure- 
ments roughly on an interval scale; see Table 9.1. 


TABLE 9.1 CHANGING TEN INTERVAL-SCALE SCORES INTO RANKS 
(ORDINAL SCALE) AND 0. t (NOMINAL SCALE) 





Dichotomized scoring — 


Verbal reasoning 


normal distribution 

Student no. 

scores 

Rank scores 

underlying 

1 

17 

3 

1 

2 

10 

7 

0 

3 

29 

I 

r 

4 

16 

4 

i 

5 

3 

10 

0 

6 

14 

5 

l 

7 

9 

8 

0 

8 

26 

2 

1 

9 

6 

9 

0 

10 

11 

6 

0 


The first column gives an identification number for the student. The 
raw scores that are assumed to come from an approximately normal dis- 
tr , button appear in the second column. These ten scores were ranked to 
obtain the third column. In Ihe fourth column, Ibe top five raw scores 
were given a 1 ; the bottom live rasv scores were given a 0. It does not seem 
proper in thiseaample for nominal-dichotomous scores to be assigned (though 
they would be identical to the scores in the fourth column), because we have 
reason to believe that a normal distribution underlies the dichotomy, or at 

least we know that a ten-category distribution does. ..... , 

Where there are two sets of scores, X, and 1 „ for each of n individuals, 
either X or Y could be measured in any one of the four manners outlined 
above Thus there are 4 X 4 = 16 possible pairs of descriptions of the 
measurement of two variables that are to be correlated. These 16 possible 


pairs of conditions on 


Xand Kean be represented as in Table 9.2. 
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TABLE 9.2 



Measurement o] 
variable Y 

Dichotomous 

Measurement of variable X 
Dichotomized, with 
underlying normal Ordinal 

Interval 
or ratio 

Dichotomous 

A 

<B> 

<C) 

(D) 

Dichotomized, with 
underlying normal 

B 

E 

(FI 

(G) 

Ordinal 

C 

F 

H 

(l> 

Interval or ratio 

D 

G 

I 

J 


It is necessary to consider only 10 of the 16 possible pairs, since the 
designation of the two variables being correlated as AT or Y is entirety 
arbitrary (because r„ = r„). In terms of correlational theory, the six cells 
with letters in parentheses in Table 9.2 are the same as the cells with the same 
letters not in parentheses. Table 9.2 will be the structure upon which the 
following discussion of several specific measures of relationship will be built. 
One appropriate correlational measure for two interval or ratio variables (J) 
was the subject of the previous chapter. This is an instance in which the 
Pearson product-moment coefficient r„ is used. We shall consider some of 
the remaining nine cases (A-l) below. 


9.3 

MEASURES OF RELATIONSHIP 
Case A 


Both variables yield nominal-dichotomous measures: the phi coefficient, <f>- 
In this case, both X and Y have been measured dichotomously. The data 
can be thought of as arranged in two columns ofO’s and |’s where each row 
corresponds to one person’s two scores. For example, 12 students in 
academic trouble in their sophomore year of college might be observed on 
the variables marital status and “dropped out of college’’; see Table 9.3. 
Arbitrarily 1 means married and 1 means dropped out, with 0 for not 
married and 0 for remaining in school. One measure of the relationship 
between X and Y is simply »•„> the Pearson product-moment coefficient. 
The Pearson product-moment coefficient calculated on nominal-dichotomous 
data is called the phi coefficient and is denoted by <£. The value of 4> for the 
data in Table 9.3 is .507 ; but this was not found with the usual computation 
formula for r ta . That formula can be replaced with a still simpler but alge- 
braically identical formula when the data on X and Y are dichotomous. 



MEASURES OF RELATIONSHIP 


TABLE 9.3 ILLUSTRATION OF THE CALCULATION OF THE PHI 
COEFFICIENT, <f> 

x y 

Marital status Attrition 

(/ married , I; ( dropped out , I ; 

Student no. not married, 0) remained, 0 ) * 


.3333 - (.4167)(.50OO) 

= V (,4167)(.5833)(.5000)(.5000) 


U. ft be the proportion of 

scoring 0 on X, will be eq more definition Is necessary: />„ is 

denoted by />„ and ?, == i /•' . hoth x ami y. If we were to operate 

the proportion of people scoring^ deanitionS| we „ ou | d find that it 

,he f °" owin8 convcnien ‘ fom; 

^ P.. - P‘P- (9-1) 

J r,(i r Pv<i. 

. wav to compute the phi coefficient. The 
Equation 9.1 1 IS * ^ product . m oment correlation 

ff .iyr.r.-ir. (9.2) 

r " “ 

, r „ measured dichotomously, X. and T. are simply the 

If “"of f’s on each variable. For example, 
proportions of 1 s on 

= 1 . 

in the illustration in T “ b ^” ^ ’Afferent from zero only when ihe ith 
-oTs^^ onTch^abte, ,n which case * * - - -rely. 
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th C n, 2X.1-. is simply a conn, oftha number or persons coring 1 on bo.h 
A- and V, hence (l/n) 2 XY = p„. Since AT is 0 or 1 and becanseO- = Oand 


X^! __ 4- 1* + . . . + & i*> _ ^ 

n ~ n 


Therefore, substituting such expressions into Eq. (9.2) produces 


y„ — P,P, Prr — P*P* . p ” ~~ P*P* ( 9 . 3 ) 

*' ~ >/(p, - p’Kp, - p*;) - p w pO ^ vajm* 

When one has no particular interest in the proportions p, and p, and 
finds it more convenient to tabulate dichotomous bivariate data in a con * 
tingettcy table (a table showing the joint occurrences of pairs of scores on 
two variables in a group), ^ can be calculated with a convenient raw score 
formula. The data in Table 9.3 can be represented as in Fig. 9.1. 


Attrition 

Montol status 

Totols 

Not married 10) 

Mamed (1) 

Dropped exit W 

2 

4 

6 

Remained (01 

5 

1 

6 

Totals 

7 

5 

12 


FIG. 9.1 Contingency table for the data in Table 9.3. 


Figure 9.1 presents the frequencies of persons, showing the four possible 
pairs of characteristics in Table 9.3. For example, five persons in Table 9.3 
were not married and remained in school during their sophomore year. The 
marginal totals for rows show the numbers of persons at both levels of 
“attrition,’' irrespective of their marital status. What is the interpretation 
of the column totals? 

Suppose that in each cell of a contingency table of the above sort we 
substitute a letter for the actual frequencies so that we can deal with the 
computation of $ more generally. See Fig. 9.2. 

The number of persons scoring 0 on X and 1 on Y is denoted by a. 
The total number of persons scoring 0 on X is a -f e. The total number of 
persons represented in the table is rt. How many people scored 0 on )* and 
0 on X? r 

ft can be shown b) substituting such equivalences as p a = (b 4- d)(n, 
p t — (o — b)Ja, and p n = 6//t into Eq. (9.t) that the phi coefficient for the 
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FIG. 9.2 General form of a 2 X 2 
contingency table. 



Variable X 


0 

1 


>- 

1 

a 

b 

at b 

0 

c 

d 

ad 

Totals 

c + c 

bid 

n 


data arranged in a contingency table like that of Fig. 9.2 is 

be — ad 


(9.4) 


* V(a + <0(b + Via + M( c + d) 

_ to a\ first derived by Karl Pearson in 1901 in a paper, 

Equation (9. ) Transactions of the Royal Society of London, 

published in the Phi 0S0 P ' en variables that cannot be measured 

; h uanma.SvI.y t ~ «he calculation of * by Eq. (9.4). the data in 
Fig. 9.1 will be uset1 ' 

20-2 

V( 7X5X6KL 

, 1 ■ M ual to the value found in Table 9.3 for the 

same™:: a V U hife^!:— , always resull, because E q s. (9,, and (9.4) 

are algebraically equivalent. 


Properties^of of ,he phi coeffictenMif ^orrel'ation^for^ihchoto- 

simply the Pearson pro etatjon of $ can present special problems, 

mous data. However t P „ £ inter p rcting r„, certain as- 

In Chapter 7 it was seen that t obstruction a scatter d.agram 

sumptions, several of which cm be h^^y ^ ^ cal cu , a - 

of the data, must be made, ne tw0 -variable normal surface to which 
tion of d. are quite a 1 dc P“™^ f . ConsequenUy, it is wrong to think that 

the interpretation of r., has reter ^ + 6 „ is , he same. 

the interpretive meaning n fact that # can assume the value +1 

Especially P' rt,n “ t . , ^ are cqua l (and hence 0 = <0 in the 2X2 
only when a + b and ^ same proportions of l’s on both 

eontingency table, ^ f ^'“"here p, = 2 and „ - .4, no higher 
y and Y.* Notice that m rig- 

Una + c-c + d. and therefore a = d This .» one 
. or. what is the “me. when£ + ^ „ ,„ al „ _ d - 0. The 0, t, and I. 0 
necessary condition for <p 
quadrants must be empty. 
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5 O'si 5 0, 0 5 

_01 ,5 0's 

1 0 to 10 I 10 1, 0 10 

20 1’s-j 

0 5 (0 15 I 10 1,1 ,0 10 1s 


FIG. 9.3 The greatest possible positive association when p, = .2 
and p, = .4. r 

posilivt relationship between dr and Y than is shown can be obtained. All 
V f 2 ' 5 P a,red “ ilh f* on A. but because there are 20 I's 
on i but only 10 I s on Y. there must be 10 I’s on ythal are paired with 0's 
"L F' diC " d p£rf5c,l y f "> m A because knowledge that a 

?0 The vat r ! * n °' r “' C 0 ' ,, ,he P oss 'bilily that his score on Y 

25 sco«s on V t' n F ‘f,n 3 " ' 4L ,n ° ,h ' r "°' d ‘. '°"S as 10 of the 

25 scores on Y are 1 s and 20 of the 25 scores on X are l's lor vice versa! 

gr« di»5 atta'ee" 0 '.'^ '“ru ThiS “ ra 8“ dad b > s °™ statisticians as a 
"l*" a,,em l> t sidK "P lhis unde sltable 
£ by 6 the maximum ^ Md ° lhCfS haVC dividcd the obtained value of 
of r.SiSSSJ n “I? *’ * maximum * for distribution 

range ZZln _ / a f d '”**«*. that can always 

l*s on X and >’ are . Pro’ll (mutZ h ? f ? repan! the P ro P ortions of 
wanting. The basic nrinr - i. • ^ dls ^ usscd at length and found it 

can attain the value 0 n!y P if ihe'distribw'^ Vi™ 1 2 P roducl - m <>™nt r 
asthatofthe Ts. ? f <hc haS ,hc Same ‘^P* 

distribution of the Xs is esactlv the ° ^ ob,a,ncd on ly lf the shape of the 
mean that p. = <7 . 1^e"'‘.H. °f ' the r s ; for * this would 

are necessary to attain lhc limin'Ti” J'*'"!"? or "versed distributions 
sufficient. Esen if A" and v ar ». Zl. ° T ■ * ^ they are by no means 
with each other, of course. ” " U,Cd ,dem,cal| y. they can correlate 0 


X li dichotomous* Tit 

We Know of no one satisractorrcMfr^!CTt a fo Underl ? lng n ° rmal d,stribut,on * 
“ ,hcsc " S JT- Perhaps for most , Jr - f ' orrc,alin S va 'mb!es measured 

one should forego the „ormali^v as.umn, ;. 0n! • '° •* infrequent, 

has discussed this problem. 3 um P l,<m a"d calculate#. Quilling (1969) 
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X is dichotomous and Y Is ordinal. The only available coefficients for this 
situation are due to Cureton (1956) and Glass (1966) (Also see Stanley 
19686.) Because these coefficients build upon rationales first proposed for 
correlating two ordinal variables, they can best be discussed after Case I 
later in this section. 


Case D 


One variable yields nomlnal-dichotomous measures, the other yields interval 
. <lires . t i, e polnt-biserial correlation coefficient. In this case, 

one'variable is measured dichotomous, y (e.g„ sex, marital status) and the 

t n f the other variable produces a collection of scores with interval 
measurement of the other P ^ a of high -school juniors, we 

mi B ht t obs P = r rv P e whether a student drops out of college (0) during the freshman 
6 remains in colleee (1), and also measure his intelligence. Observation 

f0F ntmtansof describing the relationship between X and Y is simply 
? 1 Pearson product-moment coefficient on the data as they are. 

C h a coeffidemT^callet^a poini-biserial correlation coefficient and is denoted 

Such a coefficie J the f act that there are two series of persons 

byr„. (Thetermiherm/referstom.^ ^ ^ ^ ^ ^ ^ ^ , on 

being observed on X . thos for this coefficient are due to Karl 

Pearson^ The expression prcducl-momott btseM is sometimes used instead 
iTpolr'-blvrial.) A simplified formula for r* follows. 


X., - X. 




Wo 

n(n — I) 


(9.5) 


where X, is the mean on X of those who scored 1 on Y 
X is the mean on X of those who scored 0 on Y, 
j is the standard deviation of all n scores on X, 
is the number of persons scoring 1 on Y 
a* is the number of persons scoring 0 on Y, and 

_ (Q si represents an algebraic simplification of the Pearson 

Equation («) "P coefficient formula when Y is a dichotomous 
product-moment correl onc of sevcral simplifi cations that could 

variable, , hc following formulas is equivalent to Eq. (9.5), 

and°one U of them will probably be more convenient than the others Tor a 
particular problem. 
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X A ~ X , / n,n 

r, ‘“ i t v^ii -n' 


(9.6) 


where A 5 , is the mean of all n scores on X. 

r *■ ~ X* I n ° n (9.7) 

3 , v «,(/.-!)' 

As is particularly evident from consideration of Eq. (9.5), is a measure 
of the difference between the average scores on X of the persons scoring 1 
on Y and the persons scoring 0 on Y. Because is nothing more than the 
product-moment correlation coefficient calculated on particular types of data, 
it must take on some value from —1 to + 1, inclusive. When those scoring 
1 on J'hase the same average value on X as those scoring 0 on Y, r ^ will 
be zero. Of course, r^is not defined if either n 0 or n x equals n ; “covariation 
cannot be studied if variation (on Y, in this case) does not exist. 

The calculation of r* is illustrated in Table 9.4 on data gathered to 


TABLE 9.4 ILLUSTRATION OF THE CALCULATION OF THE POINT-BISERIAL 
CORRELATION COEFFICIENT 

Y X 

Sex Height 

Person (I. mate;0, female) (in inches) Calculating r^, 


A I 

B 0 

C 1 

D I 

c 0 

r i 

C o 

If 0 

/ t 

I 1 

* 1 

L 0 

I 

H 0 

O 0 


59 
67 
6} 

65 
55 
72 
62 

60 
64 

66 
6J 
61 
62 
63 
60 


1. Equation (9.5); 

6405 - 61.14 / 8(7) __ ^ 
3.91 yj 15(14) “ 

2. Equation (9.6): 

61-25 - 62.80 / 8(t5) _ 4( 
3.91 yj 7(14) ** * 

3. Equation (9.7): 


", - 8 
n, - 7 

« * ts 


X , - 64.23 
X, - 61 14 
X -62 80 
- 3 91 


P(U> . 
h(t4) ‘ 


$ ■ The boys are taller than the girls, on the average 
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(64.25 versus 61. J 4 inches), bui the relationship found between sex and height 
is only moderate (.41). Just one girl is taller than the average boy, but six 
of the seven girls are taller than the shortest boy. 

Case E 


Both variables are dichotomous with underlying normal distributions: the 
tetrachoric correlation coefficient, r ut . In some instances, we think we know 
a great deal about the variable being measured, even though we can make 
only very crude measurements of it. For example, a test item is written to 
measure syllogistic reasoning power. The writer of the item believes that the 
ability to draw correct conclusions in a variety of syllogisms is a normally 
distributed trait, but the single test item will allow him to identify only a 
group of those who answer correctly (all of whom will be given a score of 1) 
and a group who do not (all of whom will be given a 0). As a second example, 
suppose that the heights of 1000 boys are approximately normally distributed. 
The researcher may choose to give a 1 to those taller than 5'2" and a 0 to 
those under that height, as in Fig. 9.4. Surely he is discarding information, 
but he may gain a great amount of computational ease with tolerable loss 
of information, specially if his n is large. (This device was frequently used 
in psychometrics before the wide availability of mechanical computing 
equipment. Faced with the problem of computing several hundred corre- 
lation coefficients, it proved to be expedient to employ the methods discussed 
below as a short-cut approximation.) 

Two variables X and Y are measured dichotomously on a group of 
persons, although it is believed that more costly and extensive operations 
could produce nearly normal distributions of measurements. Only 0 and l 
data are available, but the researcher’s interest is in the correlation of X and 
Y he would have obtained if he had gathered the normally distributed 
measures. (The phi coefficient will usually underestimate this relationship.) 

If he feels that his understanding of the underlying variables (i.e., that they 


produce normal ly distributed measurements) entitles him to more information, 



Height 

FIG. 9.4 Transforming normally distributed scores into dichotomous scores. 
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he will choose lo calculate the tetrachoric coefficient of correlation, r ui , 
between X and Y. 

The observable data are dichotomous scores (0 or I) for each person on 
X and >. At this first stage, the data are in the same form as when a phi 
coefficient is calculated. All of the explicit information in the data is pre- 
served when they are placed in a 2 x 2 contingency table like that of Fig. 9.2. 

The most exact formula for r ut uses the frequencies a, b, c, and d in the 
2x2 contingency table to obtain an approximation to the value of r„ that 
could be calculated if more sophisticated measurement of X and }' were 
possible. This formula was derived by Karl Pearson in 1900. Unfortunately, 
it is very complex, so we must settle for a more convenient though less exact 
approximation. 


If you have some facility with trigonometry, you will have no trouble 
understanding the following formula for approximating r ut : 


7= - (9.8) 

1 4- v bejad 

10 U ™ r r =^M^ i S he T - 10 ,ht rish ' in the above equation, 

antic meas^ed in ? trigonometric functions to find the cosine or the 
to refer a functio Un ' ou ' )lct %' y ou will find it more convenient 

1", onhe IrJ Va '“o ° r <ad, " hc) TabI ' H in Append* A. in 
!?.r,. '!f r ! h= » Eq- <».«> have been carried on, I, i. 





liven a scorcVfTir he'ans^crcd'air^t^ ,0 ° P™ 1 ”' *» 

incorrectly or omitted it. , 1” “"f ^ a "'| a . 0 if answered it 

of persons scoring each cont ains a tabulation of the numbers 

The number of r^^ms^ns^^fl !! 5 1 °^ c ® rrcrt atJ d incorrect on the two items, 
answered item 2 correctly and itm 1 *- CmS " ,C0ITCCll > , wai 64 • five persons 
for the table (rig. 9.5) is (64)(2J)/f5H61 ^ ly ' The vaIue of 
*^n I. „ i, referred ,his "*> “ ***** 

* c Tin * t^t the approximate value of” A PI«ndix. From Table H. 
53-33 IS .93. This h an a^ ';^, - r « a va,ue «r (ir)/(«0 equal to 
correlation coefficient betw« n ,f. e ,0 „ lhe Va,uc of ,hc product-moment 
normally distributed variables for which 


**«"> l {*•) 



HG.9.J frequent** of pairs of 
»«>m on two lest items. 
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Case 


test items 1 and 2 provided only dichotomous measures^ As a s«ond 

ex^mple^supposetha'fora^certam 2 y(nl sc ’ ^ 

S „7“t negative. From Table H - - " <*% un+wha. is 

.«• rt Tab, U e S H ^an W m ui, i^ubs'an.Ll error’s (see Bouvier « a,., 
equivalent, of Tabl ' H “ . . b d if , a + j)/u or (» + d)l« (i.e., />, or 

,954). Nether method should Table H will not be in 
p,) departs greatly fr ° m 'f ' of ^tween -.90 and +.90. When the 

error by more than .04 f I va , ues are over-estimates of the true value 
ratios deviate from 50 (a + i)/n or (i, + <J)/ n is greater than 

of r ut , when it is positi • f or finding r ilt should not be 

.70 or less than .30, the me (|9J5) should be USEd in such cases, 

used. Tables prepare y of Tabb H rcpr e S ents about the most 

The illustration in Fig- W « y f which Table H should be 

extreme proportion of 1 s on eitner a 

used. . m „ nt in the theory of correlation was the deri- 

An interesting developm coefficient. Instead of transforming 

vation of a “polychoric . Conchotomies (i.e., just two scoring categories), 
two normal distributionsin po]yc hotomies having more than two 

each could be transferal then on e of estimating r„ between the normally 
categories. The problem contingency table, with several rows and 

distributed variables given o y ^ ^ so|ulion t0 this problem 

several columns, relahng tb 8 Hamdan (1964) . and though it fills a 

theore^ca^gap^the^vomputational labor involved requires the use of 

electronic computers. _| to +1. One particularly ad- 

The limits on the “ j , ha , its maximum and minimum 

vantageous feature o '^‘““ow far (u + or (» + <0/n depart 
values are +1 and 8 mak<!S ^ supcrior to the phi c0 ' mci ="‘ 

from equality. Th,s P P J „ he „ normal distributions underlying the 
as a measure of K ' Carroll. 1961.) 
dichotomies can be assumeu. 


. . „„ u „derlying normal distribution, and r Is ordinal. 
X is dichotomous, wit ial f for describing the relationship between 
Weknowofnocoefflci ih .P P UI| £ rthodox c „ mb ination of ways. If such 
variables measure succest that you forego the assumption of 

measurements do an* dichotomy on X and proceed with the 

' T'ukition'of Curcton’^rank-biserial correlation coelficient, which is d,s- 

cuSd under Case H, below. 
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G 


One variable yields dichotomous measures with an underlying normal 
distribution, the other yields Interval or ratio measures; the biserial 
correlation coefficient. Suppose that variable Y is measured dichotomously 
even though one might think that more extensive or refined techniques 
could produce a nearly normal distribution of scores on We say that the 
measurement of )- has produced a dichotomy with an underlying normal 
distribution. Thus Y is measured in precisely the same way as either or the 
variables involved in the calculation of a telraehoric correlation coefficient. 
However, in this instance the data produced by measurement of X can be 
normal v°n a , 5 r™. °?. a " i,,,m ' al ,a,il> !calc ,hal ar = approximately 
SeholaliV A o a x ' " amplc ’ lhl: x !corcs might be scores on the 
measu^. P V KsC “" might * O', and l'son. test item 

Mitv t mi,h,T Ve tf " y ' Wi ' h "“ re clab ° ra 'c >«'■ °f cognitive Dai- 
scores Y *teS,E ’ 10 1'°““" “ ° f cognitive flexibility 

normally distributed s ° {. no " na,, y distributed. These two arrays of 
WhTt do ffi= r “ol a„Tffie^ii > ' CO,, ' d * “ r " lat ' d aad f °“” d - 
rw.*(hw\* ^meet^noth^^'d 1 *'** * a 'bc'b'iseria! 1 correla'tloi^coefficfent 

variables were dichotomous with underlvinp no 056 ’ ,hat for r *t bolh 

The data gathered for rh<* ^ normal distributions, 
which can be any one of several diffmentM t° f C ° nS,St ° f 30 X score * 
cither 0 or 1. foreaeh ofn pe“„L ' ^ a Y ! “ re - is 

Suppose that a teacher wishes to relate 

spend studying balancing chemical fn I! H T° Wt oftime W s^dents 
such equations. Measurements of r J ,0ns * nd lhcir ability (T)to balance 
of the homework time they gave to th™ ® at ! lcred by students* own reports 
could probably be measured fy an extensile a ? ,h ' SUbjcc '- A " ho “« h r 
that a nearly normal distribution of ach 'evement test in such a way 

teacher’s time available”^ testing \ OU,d «*»■«. suppose that the 
' 5!?* a chemical equation to be balanced * -rif administrat ion of only one 

ability is assumed to underlie the Odn™ d ' Th “ s a "°™al distribution of 

. ‘ iP ata that m ‘ sbt rcsult from such d l ' corrcct scores on the test 
Show that 1 1 or lh c 18 students balan Jh dy ap P ear ■" Table 9.5. They 

“red higher than did the 7 s: „de n „ correctly and that they 

rest, on eoeffieten. f„ r p r edi c ,i“ y- rrom .heTZw" 
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distributed Y is 



where s v is the standard deviation of the hypothetically normally distributed 
Y, not of the dichotomous measures taken on Y. The slope of the least- 
squares regression line for predicting X from Y can be approximated by the 
slope of the line that passes through the mean X score for those scoring 0 
on Y (which is denoted £„) and the mean X score for those scoring 1 on Y 
(which is denoted by JC A ). This latter line is drawn on the scatter diagram 
in Table 9.5. This line is a least-squares line in a sense, because the mean of 
a group is the point around which the sum of squared deviations is at a 
minimum. 

The slope of this line is equal to (£ %t — X’q) divided by the distance 
between the mean scores on the normally distributed Y for those scoring 0 
and 1 on the dichotomy. This latter distance is difficult to find in any 
elementary manner (see Walker and Lev, 1953, for a more advanced dis- 
cussion). The desired distance turns out to involve the height u of the unit 
normal curve above the point on the abscissa (i.e., the Y axis here) above 
which 100 (njn) percent of the area lies. (n x is the number of persons scoring 
1 on the dichotomy.) Combining these facts appropriately yields the following 
computational formula for r„„: 



where and X m9 are the mean A' scores for those scoring 1 and 0 on Y, 
respectively; 

s x is the standard deviation of the X scores; 

n, and n 0 are the numbers of l’s and 0’s on Y, respectively (n, + n 0 = 
n); and 

u is the ordinate (i.e., the height) of the unit normal distribution at 
the point above which lies 100(n t /n) percent of the area 
under the curve (see Table B in Appendix A). 

The calculation of r tl , from Eq. (9.9) will be illustrated with the data 
in Table 9.5. 

Since rt t (n — 11/18 = .61, the ordinate u of the unit norma! curve for 
the point above which lies 61 % of the area must be found. In Table B in 
Appendix A this height is found to be .3836. The summary statistics from 
Table 9.5 are substituted into Eq. (9.9): 

12.36 - 10,00 11-7 m 

2.55 (.3836)18^18*- 18 


r„i, ~ 
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variable is changed into a dichotomous variable by combining two adjacent 
categories).* 


Case H 


Both variables yield ordinal measures. Spearman Rank-Correlation Coeffi- 
cient. Raw data may be converted to ranks, or ranks may be gathered as 
the original data. “Rank in graduating class” is an example of the conversion 
^ ordered scores to ranks: grade-point averages are computed for each of 
500 students, say a rank of 1 is assigned the highest GPA, 2 to the next 
!g est, . . . , 500 to the lowest. “Judges’ rankings of excellence of a 
recitation is an example of gathering ranks as original data: 10 students 
recite a passage, and a judge assigns the rank 1 to the best recitation, 2 to 
the second best recimion, . . . , and 10 to the worst recitation. Data are 
en ga ere in this form when more refined measurements are not con- 
venient, needed, or possible. Regardless of how the ranks 1,2 n - 1, 

in rh As 06 ” 16 ’ tW ° SCtS ran ^ s tfie same n persons can be correlated 
in the same way. 

both^vert'hnstirf StUd f n ^ - arC ranlced b y a sin g ,c judge on the basis of 
students m a/ °^ ard ,hcir teacher (X) and overt hostility toward other 
hosS t ( 0 ^ r Al a t nk ? r 1 S ivc " to the student showing the greatest 
hostility toward the lrTh ^ ,ndica,es that student who showed the least 

toward other studen” S “ BC P roccdure is used for hosti,i ‘y 

on X and sixt'lTon^ ? An 9 ' 6 ’ Studcnt A was ranked second 
between the two sets of mnkf^ndTut' 1 ° r d ' 5CTibi “® ,he f' lati °" shi P 
correlation coefficient between the n nWH “ ra pttte the product-moment 
data of Tabic 9 6 intn th- r t P aired ranks, i.e., simply substitute the 
was employed first by FranSTcalT rJ 1 "* Kther obvious technique 
Psychosis., made more S ^"nan, the British 
coefficient named in his honor “5 becn rcwa rded by having the 

that the honor was due Gallon (see S,o Pearson a PP arcn,l y feU 

« ^relation coefficient col Id ’ ^ P ' ! ° 3) ‘ ™ e P roduC ‘- 

untied ranks 1, . . , n is knn\ n P on l*o sets of the n consecutive. 
'Ve shall symbolize this coefficient by r ^ eaTman ^^-correlation coefficient. 

tale note.' Th“ m's’LSna'^' "V* ,hc “b 3 ' for r„, 

lake on only the values 1 i \ and ^cause A' and Y 

, _ ea to enormous simplifications in the 

To avoid spuriously hirh r •« * 
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TABLE 9 6 THE TWO SETS OF RANKS ASSIGNED BY A JUDGE TO TWELVE 
STUDENTS ON HOSTILITY TOWARD THEIR TEACHER (X) AND 
HOSTILITY TOWARD OTHER STUDENTS (Y) 

Hostility toward Computations 


Student 


Teacher (X) Students ( T) X 1 


Y* 


(AT- Y) ! 


12 


D 


36 

25 

100 

49 


E 
F 
G 
H 
1 
J 
K 

L _ 

Sums 2 X = 78 


16 

81 


121 

144 


i Xr-n 1 ^-" 2(r-rr-<" 


f You shouW allays hold, as 

17 S as no tied ranks are allowed. U can be shown .ha, 


n(n + 1) 

Jy = l+2+... + >- 2 ’ 


In Table 9.6, we have 


n(n + 1) __ 12(13) _ 

2 2 

tician and astronomer Karl Gauss is reputed 


astonished tutor in the 

following manner: 

If 

|y = l+2 + ...+("- |) +"' 


then of course 


also (i.e., the same 


2X = n + (” _1) + ’" + 2+1 

e order). Add the two equations together 


series in reverse o 


vertically and obtain 


2 £ X = <» + 


l) + (n + l)+---+<" +I > = " ( " + 1) ’ 
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because there are n of the (n + l)’s. Divide both sides of the equation by 
2; you obtain 


the sum of the n consecutive, untied ranks Divide both sides 

of the above equation by n and you have 


the mean of the n consecutive, untied ranks 1 , 2 ,... n. 

Using similar, but slightly more complicated techniques, it can be shown 
(sec Siegel, 1956, or Edwards. 1964) that if X Bites on the n consecutive and 
untied ranks 1,2 the variance of the X's is 


Taking Eqs. (9.12) and (9.13) into consideration and performing other 

wSSfiSZXZXf ,h ‘ fo " owi "* fMmula f °' <*» 

62 <*, -13* 

'•"'--■tor- e>-'» 

ran/onV , ^l l ’ft. din f , '"“e^ W “ n ,h ' ,,h P™"’* rank on X and his 

The value ore. for the two sets ofeanls in Table 9,6 is round as follows: 

r.-l «H> , 42 

12(144 - 1, 143 = - 71 - 

the students’ os erihoshhtvt 1 ” stron S direct relationship betsveen 

ranked by the judge ' J “ "* toch ' r a " d “ward other students as 

formula, which'ij'^^iyj^'l^^'^^^^w^'tcalculalor, the following 
“ i • win be more convenient: 


, 1 

>-lU,a.| | 'l"+ 1 )} (9.15) 

lation of r, - r„ which may need to 5»riiten'dos™ when S'ralltioM are 
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made on a desk calculator. Each journey of a datum from machine to paper 
and back to machine is an opportunity to err. 

Often, though, the deviations can be squared and summed mentally, 
without writing down anything except the sum of the squared deviations 
[needed in Eq. (9.14)]. Stanley (1964, pp. 375-78) offers a table from which 

r $ can be secured readily when one knows £ (X t — Y { ) 2 and n, for any value 

of n from 2 through 10. For further discussion of the use of r t in practical 
situations, see Stanley (1964, pp. 98-100 and 379-81). 

The Problem of Tied Ranks 

Ties in measurements often occur. When they do, there is a special rule 
for assigning ranks. For example, if the 1 2th and 13th highest ranking 
students in a graduating class of 245 each has a grade-point average of 4.76, 
it is customary to assign both students the average of the two ranks: (12 + 
13)/2 = 12.5. Or a judge may find it impossible to discriminate between the 
quality of the handwriting of the top three students, so he gives all three 
students the average of the top three ranks, 2 = (1 + 2 -j- 3)/3. (As a 
generous gesture, some judges would give all three a rank of 1, but that fails 
to preserve the sum of the 3 untied ranks 1, 2, 3.) 

When tied ranks occur, neither Eq. (9.14) nor Eq. (9.15) is equivalent to 
the r av between the ranks. Even though the mean of the numbers 1,2, ... ,n 
does not change when ties in the ranks occur [it is still (n -f- J)/2], the variance 
of the ranks is less than («* — 1)/12. Consequently, the variance simplifi- 
cations in the formula for r xv that led to Eq s. (9.14) and (9.15) cannot be 
made. 

There are three possible ways to proceed in the calculation of r, when 
tied ranks occur: (a) use the computational formula for r av on the data — this 
will always give r,, whether any ranks are tied or not; (b) use a formula (see 
Kendall, 1955) that incorporates corrections of s| and s 2 for the ties in the 
ranks; (c) compute an approximation to r, via Eq, (9A4) or (9JS) r if there 
are few ties. With the current availability of fully automatic desk calculators 
and electronic computers we suggest method (a)— that you compute the r„ 
between the ranks, having assigned ranks to tied measurements by the 
averaging method described above. 


Interpretation and Use of r t 

No special interpretation can be given to r, over and above the statement 
that it equals the product-moment correlation coefficient calculated on ranks. 
The value of r, can never be less than —1 nor greater than +1. It equals 
-f 1 only if each person has exactly the same ranks on both X and Y. 
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The Spearman rank-correlation coefficient is especially useful when the 
original data are ranks, as when judges rank persons or things. It is some- 
times regarded as a quick means of estimating r n . The original data are 
ranked, r, is computed, and a trigonometric function of r, is supposed to 
convert r, back into the value of r X¥ . This procedure does not accomplish 
this except approximately so for very large n. Desk calculators and electronic 
computers have taken enough of the labor out of calculation so that there 
is no longer any reason not to calculate r xv on unranked data. 

The Spearman rank-correlation coefficient r, is very closely tied to the 
concept of a product-moment correlation coefficient. Another coefficient 
of correlation exists Tor data that are ranks on both X and Y; it involves 
a rather different concept of what ‘’relationship" is and how it should be 
measured. 


Kendall's Tau, t 


of ,“" cla "' 0 " m => in 'his boot so far employ lhe 
proto-mo mem rartonale of Pearson ip one form or another Some 

dichotomous or ordinal datr“e'r/ r ° d “d"'° mn ’'h r ° rml '' a apP ' itd ‘° 
approximate the Pearson r, 'e.g'"/’ 'W*"* “t'empts to 

Maurice Kenri-.ii „ 5 , u i ana r »'«- Thc English statistician 

measurement of relationship K fCW aMem P ts at conceptualizing the 
Variab, “ in 3 manner other than by 
few basiealiy „e* ann E'"’" 1 ’''- His tm,r “ ™«lted in one of the 

For a readable account of hisloV“el'Ke“dat( C i955y n m ° d " n 

both X ami >' are^im /j'consecmiTe am'l" C ° C |! ic,e " ,s . the observations on 

based bis coeiiicient of correlation on ih untl *J'*" ll> 1 • 2. Kendall 

ordered in the same direction on boih JFand" F 'to Pa, 'u S ° f ‘ ha ' ‘ K 
' »“ and denoted r, is merely an Thns - hls m 'asure, called 

the rankings on A" and Y r3 ° r °^ c cx,en t of disagreement in 

Suppose that the ranks in Tihtr- o n . 
r - Suppose sse select any pair of oerlo' ““'S"' 11 lo ei S h ' persons on 
above B on X, but B is ranted atos-c A^Tr -Jv - a " d A " rankcd 
from a direct relationship between X and Y- .J’ re P r "' ms 3 departure 
status of A and B as sse move from X to Y a..’, " ,vcr “ on ° r 'be relative 
between X and Y, at least as fir ... a j® UCS ^ or an inv erse relationship 
"(n - DO Iar as A and B are cam*m*A *n.~. 


s fir or A . inverse relationship 

- „ pc,,,* \ and 0 are concerned. There are 

relationship between A' and /must hr' f C °" ,r ‘ but,on of each pair to the 
ff Un,edif ‘heir order on “and rul™**™* 


03 X C "■ <> W on )- (3 s, 5) E p' bTO "‘ Se A r3nk ’ hi 5 h =' than It 
counted if ,hei r „ rdcr „„ . F " J“V !»■' »r persons, an hrereto, is 

Y ,s d.iferenl, Pe r ,„„, ^ and „ conlribule 
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TABLE 9 7 ILLUSTRATION OF THE CALCULATION OF KENDALL’S TAU. 
(I IS THE HIGHEST RANK) 

Person X Y Agreements Inversions 


A 1 3 5 2 

C ' 2 1 6 0 

S j ? 

n = 8 

/* - 0 21-7 

" 4 5 J 2 

T n(n - l)/2 28 

E 5 7 1 ; 


F 6 8 ? 0 


D 7 4 in 


/* = 21 0=7 | 



an inversion; note that Fand D also contribute one inversion: F has a higher 

an agreement or “aversions. What will 

^^rn^ro^^emfnS inversions when Zis the reverse of 
XI Zero and n(n - various t0 deta . eoellicien, 

„f K '" da " chosc th ‘ w,0Wins “ 

for his coefficient t: 

_ (total number of “inversions ) 


r.ot.l number of ■’agreements") 


n(/J — l)/2 


(9.16) 


. -f counting the agreements and inversions in the ranks on X 
and of persons can he iahorions. Fortune, el y , ,t 

can be simplified. algebraic manipulation of Eq. (9.16). 

Let us calfthe total number of agreements Fand the total number of inversions 
n This notation, Eq. (9.16) becomes 

P-Q 

T ~ «(« - D/2 ' 

Hcnote P - Q by ‘he symbol S so that r may be written 
It is customary to denote r sr 7 

_ s (9.17) 

T n(n - l)/2 ’ 

Since the sum of F and Q must be n(» - D/2. we can write Eq. (9.17, as 
p — Q rrt(n — D /2 — Q] ~ Q __ j _ — — 

' = n(n “ 1)/2 


n(n — 1) 


(9.18) 
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Kendall’s t With Tied Ranks 

When ties occur in the ranks on either X or Y,P and Q are still determined 
as indicated in Table 9.7. The only alteration in the formula for r occurs in 
the denominator. The correction of the denominator of r involves the 
quantities K x and K v (both of which are functions of the numbers of persons 
tied at the various ranks on X and Y). 

The following formula is applied when ties occur in the ranks on X and 


vWi - l)/2] ~ K x y/[n{n - l)/2] - k/ 

where K m — (|) £/,(/, “ 0 (where /, is the number of tied observations in 
each group of ties on X), and 

K y ~ (i) 2 f((ft ~ 0 (where/, is the number of tied observations in 
each group of ties on Y). 

For an illustration of the application of Eq. (9.20), see Siegel (1956, pp. 
218-19) or a more recent textbook on nonparametric statistics. 


Case 1 


X is ordinal and Y is intervaf or ratio. No coefficient has been developed 
and studied for this particular case. If you find yourself with variables 
measured in these ways, it would be advisable to convert the Y scores to 
ranks and proceed with the calculation of either Spearman’s or Kendall’s 
rank-correlation coefficients. 


Return to Case C 

Rank-biserial correlation. At this point it is appropriate to discuss the 
coefficient first mentioned under Case C, above. One notable coefficient 
for correlating a dichotomous variable X and an ordinal variable Thas been 
investigated by Cureton (1956) and Glass (1966b). This coefficient is closely 
related to Kendall’s r and incorporates in its definition the concepts of 
agreements and inversions. We shall denote this coefficient by r,», the rank- 
biserial r. 

Let A' be a dichotomous variable and Y a variable comprising the n 

untied ranks 1,2, ,»■ Cureton sought a coefficient descriptive of the 

relationship between X and Y such that (a) it would have attainable limits 
±1 under all circumstances, (b) it would be -f 1 when the n t highest ranks 
are all 1 on the dichotomy, and (c) it would be strictly nonparametric, i.e., 
defined wholly in terms of inversions and agreements without such concepts 
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as mean, variance, regression, etc. Suppose that the following data arc 
gathered on X and Y for n = !0 persons: 

Person X Y 


A 0 1 

B 1 10 

C 0 2 

D 1 9 

E 0 5 

E 0 8 

G 1 4 

" 1 7 

' 0 3 

J 0 6 

To compute r„ the data ate arranged in the following manner; 


Ranks on Y for 
X *= 1 X - o 


Agreements 



Inversions 


rani undrr columnO.'' TheTe'l^l',"""!' "" dcr "'“ntrt 1 for every small 
for every smaller rani under colum^V < * , *"ri. a *^ f'vcn rani under col unit 
corresponding , hc „„i ..4..“'™" Thu, there are three agreemer 
?" 1 *' 3 . 2. and |. under column 0 ' iin " ,h "= « >»rle small 

^ " sum ora " aer " mcms in 11 


'"►'hen no ti« eXllt p 


r„ ? P ~Q 

• (9.21) 

‘ h " C ”• ” ' h ' oomber of person, at 0 on 
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the dichotomy and n x is the number of persons at !. Hence 


For the above data. 


P~Q 

«0«l 


20-4 = 16 
(4X6) 24 



(9.22) 


Glass (1966b) has shown that r,„ i s algebraically equivalent to a coeffi- 
cient analogous to r M , for ordinal variables. The practical importance of this 
equivalence is that it provides an easy means of computing r rb without 
counting agreements and inversions. The following formula can be shown 
to yield the same value as that given by Eq. (9.22) : 


= F,), 


(9.23) 


. where is the average rank of those scoring I on X and 
F„ is the average rank of those scoring 0 on X. 

The data on which the calculation of r rb by Eq. (9.22) was illustrated 
will be used to illustrate use of E q. (9.23). 


Ranks on Y for 

X = 1 X = 0 Calculations 


10 8 n, = 4 n c = 6 

9 6 

7 5 ?i = ^ = 7-500 y„ = V * 4.167 

4 3 

2 r, 6 — 1^(7.500 — 4.167) = 2^12 = .67 

1 

When no ties occur in Y, Eq. (9.23) can always be used in place of 
Eq. (9.22). For a discussion of the case when there are tics in Y see Cureton 
(1968). 

Whitfield (1947) derived a coefficient for correlating one dichotomous 
and one ordinal variable. His rationale was to consider the dichotomous 
variable to be a ranking variable tied at two ranks. He then applied Kendall’s 
t formula for tied ranks. The resulting coefficient has the same numerator 
as r Tb but a different denominator. The rank-biserial coefficient is to be 
preferred as a measure of correlation because Whitfield's coefficient may not 
attain 4- 1 when some perfect relationships between X and Y exist. For ex- 
ample, when n = 5, n b = 2, and n t — 3 and the fourth- and fifth-ranking 
persons on Fhave the two scores ofl on X, r rb — 1, but Whitfield's coefficient 
equals only .77. 



9.4 

PART CORRELATION AND 
PARTIAL CORRELATION 


Concents Tram simple linear regression and correlation are combined in 
Z Zrela non and pamn' cordon. We shall begin this 
the development of part correlation, since part.al correlate is its generali- 
zation— statistically, at least. 

A researcher wishes to determine the correlation between a measure ot 
intelligence X and learning performance during an instructional unit m 
social studies. He chooses to measure X with the Kuhlmann-Anderson 
intelligence test; however, he faces some important decisions about how to 
measure learning performance. He can construct a respectable achievement 
test of the content bcinglcarned. But to give the test and score it for “number 
of correct answers” is not what this researcher means by the words ‘‘learning 
performance.” A large correlation of .Vwith the achiev ement-test score might 
result even though no learning at all took place during instruction, because 
variability on the achievement test could be due to intelligence and test- 
wiseness and not be at all due to differential learning during instruction. 
Administering the achievement test both before and after instruction and 
subtracting each student’s initial score from his post-instruction score 
produces a measure that is far closer to the researcher's notion of a measure 
of “learning performance.” One slight difficulty remains. Such “posttest 
minus pretest" measures of learning would have a predictable negative 
correlation with intelligence due, in part, to the manner in which measure- 
ment errors arc combined by subtracting one fallible measure from another. 
In fact, it is almost certain that these “difference scores" will have a negative 
correlation with the pretest scores upon which they are based. This is 
considered a defect of such “difference scores" when we have reason to 
believe that amount of learning should not necessarily correlate negatively 
with pretest status. An alternative method is to measure learning or change 
by fitting a straight regression line to the pretest and posttest achies ement- 
test data and taking the deviation from the regression line (errors of estimate) 
measured along the posttest asis. This devialion, called a rcMucl change 

revise 9 6 > ' an ' 1 Z dcno,c «« postal and prett««. 

respectively. (The 7. here is not ihe standard score a mentioned earlier.) 


The residual gain score is 

We kno. from previous '^“ rci rc S r “ !io "^".': 


as ihe error <J " l ° tcd b * e * * because ii is precisely the same 

as me error madetn predicting K from 2 bv the i;„ r . 


correlation of r, , witl^Z it^waywero* 1 " As* fCSrCSSi ° n ^ 

* has the property thit th* * ’ 3 mcasurc of learning, then. 

property that the measure of how much has ^ lcarncd is 


m 
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unrelated (r = 0) with initial performance. The researcher may consider this 
property desirable.* 

The correlation of X with e vt is called a part correlation. In a sense, 
it is the correlation of X with Y after the portion of Y that can be predicted 
linearly from Z has been removed from Y. However, such an ambiguous 
verbal definition will not substitute for the unequivocal definition embodied 
in the symbol r„ t 

Of course, the part-correlation coefficient f w>i could be found by 
actually calculating the values of e, , from the regression line of T on Z 
and correlating these values with X. However, we shall now see how this 
computational labor can be bypassed. 

By definition, r xe is given by 

r„, , = ^Sts. , (9.24) 

i.e., by the ratio of the covariance of X with e v x to the product of the two 
standard deviations. 

From Sec. 8.3, we know s^ t — s t -J 1 — r* x . It remains to evaluate 
the numerator of Eq. (9.24) 

= s„ - 0 - V,,- (9.25) 

•The properties of residual gain scores have been extensively studied; see Bcrciter 
(1963), Lord (1963), and Tucker, Damarin, and Messjck (1966). 
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(The zero occurs because the covariance of X with the constant b 0 •> 

The regression slope b * in Eq. (9.25) is for predicting Y from Z, thus 
&i = r tt 5 t {s,. Combining these facts in Eq. (9.25) produces 


~ rjsjsjs;, (9.26) 

Dividing the numerator and denominator of Eq. (9.26) by V, yields 


_ (i w ftA) ~ — r »« -L r «rw * (9.27) 

r ”" y/l — fli y/l-rl. 

Thus we see that r„ #f can be calculated directly from r„, r„, and r„ ( . 
For example, let X be the measure of intelligence from the Kuhlmann- 
Anderson test and Z and Y the measures of pretest and postlest achievement, 
respectively. Suppose that r„ = .70, r„ = .50, and r„ — .SO. The value 
of r a< t is 

_ .70 - (.SOX .80) ^ -70 - .40 5Q 
V 1-(.S0) 1 

Thus, although the correlation r„ between X and Tis .70, when one eliminates 
the linear relationship of Y with Z, the residual relationship of X with c, , 
is only ,50. 

By no means is part correlation limited to applications in which Z and 
Y are “pretests” and “posttests.” One could, for example, apply the part- 
correlation technique with X a measure of reading speed, Y a measure of 
reading comprehension, and Z a measure of intelligence. In this instance, 
' «, , measures the relationship between reading speed X and that part of 
reading comprehension f, , that is unrelated to intelligence. 

In one sense, partial correlation is a simple extension of part correlation. 
To find the partial correlation of X and Y with Z “held constant” or “partialed 
out,” we merely calculate the two arrays of errors of estimate for predicting 
X from Z and Y from Z and correlate them. Symbolically, the partial 
correlation of X and Y with Z partialed out is given by 


where e, , =* X - (6 0 + 6,Z) and 
«•..= Y - (6J fcJZ). 

• In I he numerical -subscript notation sometimes used this becomes 
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Notice that there are two distinct regression lines involved in calculating 
the two sets of errors of estimate (or “residuals”): the line for predicting 
X from Z — which has constants b 0 and bi — and the line for predicting Y 
from Z— the constants for which have been labeled b* and b* to distinguish 
them from those for the X and Z regression line. 

As in the case of part correlation, the actual calculation of the errors 
of estimate for all n persons is unnecessary in the calculation of r XVI . The 
partial correlation coefficient can be calculated directly from r xv , r xz , and r vt . 

It can be shown, in a manner analogous to the derivation of Eq. (9.27) 
for the part-correlation coefficient, that r„. x is given by the following formula: 


'Vl-rLVl- 


(9.28) 


Next we shall see what interpretation may be given to a calculated 
value of r„. t . One might hypothesize that there exists a positive correlation 
between reading performance X and visual perceptual ability Y (as evidenced 
by eye coordination, scanning speed, etc.). Suppose a sample of 30 children, 
ranging in age (Z) from 6 years to 15 years, yields a correlation of X with 
Z r , of .64. The conclusion that some children read better than others 
because of greater perceptual abilities is tempting, but the cautious researcher 
will avoid drawing it. It is obvious that as children grow older they develop 
greater eye coordination and other perceptual abilities as a part of natural 
maturation. Moreover, the same children receive instruction in school 
which helps make them better readers year after year, up to a point. Could 
it not be that measures of both X and Y increase (improve) with age, in the 
one case as a consequence of physical maturation and in the other case as a 
function of mental maturation and increased exposure to instruction? This 
certainly could be the case. If the correlation of X and Y were zero at any 
one level of chronological age (instead of over the range from 6 years to 
15 years), the r z „ of .64 for the sample of 30 children has far different impli- 
cations. Indeed, even the cautious researcher would be tempted to conclude 
that the observed r xy of .64 was due to a common relationship of both 
reading performance X and visual perceptual ability Y to chronological age 
Z, and was not due to any direct relationship between X and Y. How can 
we find what the value of r„ would be for any single value of the chronological 
age variable Z? Under suitable assumptions, this desired correlation equals 
the partial correlation of X and Y with Z held constant: r xut . 


• r,,-i is called * first-order partial-correlat.on coefficient, because the linear 
of one variable, Z, is pariiated out. r„ is a zero-order correlat ion coefficient, because n<*h' 


is partialed out. 
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Provided that Z has a linear relationship with both X and Y and that the 
strength of the linear relationship between X and Y is the same for persons 
at any level of Z, then r nx equals the value of r„ we would obtain by 
correlating X and Y for a group of persons having identical scores on Z. 
For example, in the above illustration suppose that r„ = .64, r zt = .80, 
and r„ = .80. The value of r„ , from Hq. (9.28) is 


■64 - (.80)(.8Q) 


Vl — .80* Vi - .80* (-60)(.60) 

Thus we would estimate the value of r x1 for children of the same chrono- 
logical age to be zero. If enough children of the same chronological age 
were available, we could calculate r„ for them alone to check the above 
result. However, in our example there was not a sufficient number of children 
of the same age; there were 30 who ranged from 6 to 15 years of age. The 
partial-correlation coefficient serves the purpose of estimating for a single 
level of chronological age when there is an insufficient number of persons at 
any single chronological age to do the estimating by direct calculation. 

More than one variable can be “partialed out,” using the techniques 
of multiple regression to be discussed in the next section. For formulas and 

explanations, see Ezekiel and Fox ( i 963). With partial correlation.especially, 

' “ , k r P .'? ™ nd ' h! " corre, “‘°” does not necessarily mean 

? J I’ethavroral sciences causation is usually complex and in- 

has riola?.d mmbl ' intellectually to avoid assuming that he 

east a 'l n, ’"l””? 5 ” l, “ ° lh " ”P' a »ati°ns of one’s results are 
at least as plaustble. For discussions of pitfalls see Lerner (1965) 

9.5 

MULTIPLE CORRELATION 

AND PREDICTION 


ndep/rrmr/otl"’ M^’roewit^h'”'.!^ th ' S Chapter is kn ° w " as 
correlational technique of Chapter 7 mulr^l'" 3 ' 7 P f ars ° n pn>duct-moment 
known a. m»7„>/e need eril e, ™ i p,t “ rr ' la,i °" has a second side 

prediction is the estimation of a variable rth^d j > “ rp0Se * m “"' P ' C 
linear combination of m independent variables * T T ! ’° m ‘ 

estimate a second variable ^X'estimf ” h ™ ' variab ' c * was “s'd 1° 

equivalent tochoosing valn»t“™“ *“ 

hr, - b, - b.x,)' 

was as small as possihlr 

«iimate f, of the ith person's < «!rrt ,,0n **'*' P rovi ^ cs thc least-squares 
person , score on the variable Y. This type of esli- 
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mation is sometimes termed univariate estimation or prediction because there 
is only one “predictor variable.” A multivariate prediction of the Y variable 
given scores on m independent variables is 

? = + Mi + M + . . . + b m X m . (9.29) 

Equation (9.29) provides a multiple-prediction or multiple-regression 
equation. Equation (9.29) is sometimes referred to as a linear regression 
equation since the b' s appear only with exponents of 1 and never appear as 
squared terms, cubed terms, etc. Of course, Eq. (9.29) alone is of no value; 
the procedure by which “good” values for the b's are chosen must be 
specified. Again the least-squares criterion is invoked, and those values of 
b 0 , b lt . . . , b m are chosen that minimize the quantity 

| (y, - b 0 - - b 2 x i2 - ... - b m x im y (9.30) 

for a given set of values of Y and the X's. 

The values b 0 , . . . , b n that minimize the above quantity provide us 
with b a b t X i + . . . -f b m X m as a good estimate of Y: 

f t = b 0 + b l x il + ... + b m x in . 

The Pearson product-moment correlation between Y and t is a measure 
of how well the “best” linear weighting of the independent variables X u ... , 
X m predicts or correlates with the single dependent variable Y. This special 
case of Pearson’s r is called the multiple correlation coefficient and is denoted 

by R, li2 m . A second sense in which the values that minimize 

Eq. (9.30) are “best” is as follows: the maximum possible positive correlation 
between Y and any linear combination of X t , ... , X m is attained when the 
X’s are combined into 

bo + bxX t + . . . + b„X„, 

the b's being those values that minimize Eq. (9.30). Thus, 
b 0 + Mi + • • • + b m X m 

not only provides the least-squares estimate of Y but it also correlates higher 
with Y than could any other linear combination of the X variables. (As a 
consequence of how the weights for the X variables are derived, the multiple 
correlation coefficient will always be positive or zero.) 

The theory and techniques of multiple prediction and correlation are 
involved and complex. A comprehensive treatment of these subjects would 
occupy many pages. Fortunately, there is no lack of excellent treatments of 
these topics in the pedagogical literature of applied statistics. You will find 
the theoretical and applied aspects of multiple regression developed fully 
in such texts as DuBois (1957), Rozeboom (1966), Draper and Smith (1966), 
and Acton (1959). For a briefer treatment see Williams (1968). 
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Provided that Z has a linear relationship with both A'and Xand that the 
strength of the linear relationship between X and Y is the same for persons 
at any level of Z, then r xyi equals the value of r„ we would obtain by 
correlating X and Y for a group of persons having identical scores on Z. 
For example, in the above illustration suppose that = .64, r„ = .80, 
and r„, = .80. The value of r xv , from Eq. (9.28) is 


.64 - (.80K.80) .64 - .64 


9.5 


>/l -.80 ! Vl -.80* (.60)(.60) 

Thus we would estimate the value of r„ for children of the same chrono- 
logical age to be zero. If enough children of the same chronological age 
were available, we could calculate r Iy for them alone to check the above 
result. However, in our example there was not a sufficient number of children 
or the same age; there were 30 who ranged from 6 to 15 years of age. The 
partial-correlation coefficient serves the purpose of estimating r,, for a single 
level of chronological age when there is an insufficient number of persons at 
any single chronological age to do the estimating by direct calculation. 
nf ha " ° nC var,ab,c can bc “Parlialed out,” using the techniques 

re 8 rt ”‘°" in the next section. For formulas and 

it i » l ,„ r ! C, J (19fl3) - W * lb partial correlation, especially, 
causadon ? ” C0,rt,a,i ' >n not necessarily mean 

d rect One 1 n av !° ral s " c "" s causa ' ion i! usua 'ly complex and in- 
t .l,aZ | „ , “ mmbk ittidlcclually to avoid assuming that he 

a. east as n, a a “ bI '" n F “'" C “ Wh .'" °' h " ”P' a "°ii°"s of one’s results are 
least as plausible. For discussions of pitfalls see Lerner (1965) 
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mation is sometimes termed univariate estimation or prediction because there 
is only one “predictor variable.” A multivariate prediction of the Y variable 
given scores on m independent variables is 

t=b 0 + b t X t + Mi + • • • + b m X m . (9.2 9) 

Equation (9.29) provides a multiple-prediction or multiple-regression 
equation. Equation (9.29) is sometimes referred to as a linear regression 
equation since the b's appear only with exponents of 1 and never appear as 
squared terms, cubed terms, etc. Of course, Eq. (9.29) alone is of no value; 
the procedure by which “good” values for the b's are chosen must be 
specified. Again the least-squares criterion is invoked, and those values of 
b 0 , b it . . . ,b m are chosen that minimize the quantity 

i (y, - b. - b,X„ - b,X„ - . . . - b.X,J' (9.30) 

for a given set of values of Y and the X's. 

The values b 0t ... ,b m that minimize the above quantity provide us 
with b 0 -f bffC i + . . . + b m X m as a good estimate of Y: 

f, = b 0 + M.i + • • • + b m X, m . 

The Pearson product-moment correlation between Kand f^is a measure 
of how well the “best” linear weighting of the independent variables X lt , 
X m predicts or correlates with the single dependent variable Y. This special 
case of Pearson’s r is called the multiple correlation coefficient and is denoted 

by R v i i8 A second sense in which the values b„, . . . ,b m that minimize 

Eq. (9.30) are “best” is as follows: the maximum possible positive correlation 
between Y and any linear combination of X u . . . , X„ is attained when the 
X's are combined into 

K + b t X t ■)- . . . + b m X m , 

the b’s being those values that minimize Eq. (9.30). Thus, 

b 0 + Mi + • • • + b A 

not only provides the least-squares estimate of Y but it also correlates higher 
with Y than could any other linear combination of the X variables. (As a 
consequence of how the weights for the X variables are derived, the multiple 
correlation coefficient will always be positive or zero.) 

The theory and techniques of multiple prediction and correlation are 
involved and complex. A comprehensive treatment of these subjects would 
occupy many pages. Fortunately, there is no lack of excellent treatments of 
these topics in the pedagogical literature of applied statistics. You will find 
the theoretical and applied aspects of multiple regression developed fully 
in such texts as DuBois (1957), Rozeboom (1966), Draper and Smith (1966), 
and Acton (1959). For a briefer treatment see Williams (1968). 
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In the remainder of this section we shall illustrate multiple prediction 
and correlation for the case in which two variables X x and X t are used to 
predict the criterion variable Y. Suppose that Y is grade-point average for 
college freshmen at the end of two semesters. One wishes to predict Y from 
X u high-school grade-point average, and X lt verbal ability as measured by 
the Scholastic Aptitude Test. Many details of the following discussion will 
be simplified if it is assumed that Y, X x , and X 2 are transformed to standard 
scores with mean zero and variance 1. Thus Y, X x , and X t become z„ z u 
and ? t . We seek the values of b 0 , b x , and b t that minimize the quantity 

for a group of n students for whom freshman GPA, Y, high-school GPA, X u 
and SAT Verbal scores, X., are available. 

Without substantiating the claim here, we state that the least-squares 
estimates b„, b lt and b t arc as follows: 


= 0, b,-- 


»!' It , 


(9.31) 


1 ~ ^ 1 — r\ t 

and V h !n f « 1 ' S ZCr °J S 3 consc< l uenc « of having transformed Y, X x , 

sUmbrdizeT £ SC ° rCS; - ° W0UW n0t be ° if thcse variables *** not been 
coefficients Ix-tur * '* s an d r n are simply the correlation 

*• rand - v - and * x - “■> « *• 

scored ^ eiveVhUM SCTSe) CS ‘ irnatC ° f the /,h P crson ’ s s,andard 

score on Y given hts standard scores on X x and X. is 

‘.-b,: t +b,z„ (932 ) 

where fc, and arc as defined in Eo ro m r .. 
in.o the scale or the ortjinaCaSSJ^r'" 6 Eq : (932) b “ k 

for estimating Y from X x and X t : 8 " mult, P ,e ' rc 6 ression equation 

f, ~ (‘**)*.+ (F r.-it.ijr,). (9.33) 

grade-point 

from Eq. (9.33) is the multiple correSn ' c ^ e '.P° in * av o ra 6 e s obtained 
grade-point average and the *’ ' * betwcen frcshman 

average and SAT Verbal scores. One JUS?"* orW S h ' scho01 g^dc-point 
then pairs of values of T.and Y u U ^ COm P ut « R, i, t directly from 
the multiple correlation coefficient ° s ^ V "' 0nccr *‘: r «* andr » a «*:nown f 
equation: c,cnt ,s S ,Vcn conveniently by the following 


*rVl — Vv-+7^. 


(9.34) 
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The values of and b t in the above equation are, of course, obtained 
from the formulas in Eq. (9.31). Note that the positive square root is taken 
so that R will never be negative. It is not as simple to see that the term under 
the radical will never be negative, but such is the case. 

Data from a study conducted by Dizney and Gromen (1967), shown in 
Table 9.8, will be used to illustrate multiple prediction and correlation for 

TABLE 9.8 SUMMARY DATA FROM A MULTIPLE-PREDICTION STUDY 
(DIZNEY AND GROMEN, 1967)* 

Intercorrelations Standard 



X x 

X t 

y 

Means 

deviations 

X x (MLA—Reading) 1 

1.00 

.58 

.33 

25.55 

10 20 

X t (MLA— Writing) 1 


1.00 

.45 

63.22 

11.91 

y ( German grade) 



J.OO 

2.61 

0.50 


the case of two independent variables. Dizney and Gromen studied the 
relationship of reading proficiency A'i and writing proficiency X 2 (both as 
measured by the Modern Language Association Foreign Language Profi- 
ciency Tests) to course grades in the second quarter of a college German 
class. A total of n = 111 students participated. 

The values of />, and b 2 for predicting standard scores on Y from standard 
scores on Xi and X 2 are found from Eq. (9.31) as follows: 


hi = 


r , i ~ r, t r J2 _ 

l-r% 


•» T , w 

1 - (.58)' 


. r vi r yi r n __ -45 ,33(.58) _ 

2 ~ l-rf 2 1 - (.58)’ 

Hence, the best estimate of standard scores on Y given 2 , and z x is 
z, = .1042, + .3902,. 

From Eq. (9.33) and the data in Table 9.8 we can construct the multiple- 
prediction equation for raw scores. 



? - .005X, + .016.V, + (2.61 - (.005)25.55 - (.016)63,22) 
= .005*, + M6X t + 1.47. 


* Dizney and Gromen report their statistics to too few significant figures for the 
desirable level or accuracy in the computations here. The r's, for example, should be 
to the nearest four decimal places, rather than ohty two. They probably used more figures 
in their computations than they reported to the reader, however. 
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The multiple correlation coefficient which is the product-moment 
correlation between Y and the best-weighted composite of X x and X t , could 
either be calculated directly or found more conveniently by Eq. (9.34): 

R, i. 8 = + - >/<.104).33 + (.390) 45 = .46. 

Notice that the best combination of AT, and X t is scarcely any better 
as a predictor of Y than is A'j alone. Combining X t with Xt in an optimal 
way increases the correlation of X t with Y from .45 to .46. This “expend- 
ability” of X x arises from the fact that AT, and X 2 correlate about equally with 

Y (with X t correlating slightly higher), and A - , and X z are substantially 
correlated themselves (r, t «= .58). The net effect of combining two predictors 
Xi and X z increases when X x and X t are both substantially correlated with 

Y but have a low correlation with each other. Note below how the value of 
Rti.t depends upon the value of r ls : 


Case 1 



AT, 

Xt 

Y 


1.0 

0 

.50 

Xt 


1.0 

.50 

Y 



1.0 


6, = .50 b t = .50 


I*. i.i = .5(E - 


Case 2 



Xi 

x. 

Y 

1 

1.0 

.5 

.5 

Xi 


1.0 

.5 

Y 



1.0 


K- 

.33 bt 

= .33 


R, i.t = V.33(.50) + .33(.50) - .57 


The multiple R is substantially larger in the case in which r l2 = 0- As 
you explore the relationships between the correlations of A', and X t with Y, 
the correlation of X x with X t , and the value of 7?, v t , you may have need of an 
inequality that relates r rl and to If r„ and are given, then r u must 
satisfy the following inequality: 

r .v'Vi ~ >/(l - r;,Xl - rU) < r n < r fl r rt + V(1 - rJ.Xl - fa- (9-35) 
This result comes readily by algebraic manipulation of the inequality 
* *r whcre .'»*» is thc '«« of Eq. (9.28). More generally the 

correlation between variables I and 2 cannot be outside the limits 

'•»'» ± V(1 - r*,Xl - r a ). 

?’ lhc ,IInilS thal r " attain are the conventional ±1. 
ij* _ | l T ** or r » — 1 > r lt must be 1 . For r u and r a = 1 ,/■„ must 

whatth ev mJn 1 c S l c hcSe , cqUat,0nS in, ° * ords t0 * ™re understand 
55 a “ d Wa ” g (1965 > and °'“ < 19W ) 
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Interpreting multiple-regression results causally is fraught with hazards. 
Probably sociologists have struggled with this problem more than have 
psychologists or educators. For example, see Coleman et al., ( 1966), Helpful 
methodological articles have been provided by Werts (1968) and Pugh (1968). 


problems and exercises 

1. The following are characteristics of people and methods of measuring them: 

Characteristic Measurement 

A. Sex Dichotomous; males — 1, females — 0 

B. Age Measured in months to nearest month 

C. Height Measured to nearest inch 

D. Political preference Dichotomous; Democrat — 1, Republican — 0 

E. Anxiety Measured by the judgment of clinical psychologists 

in ranks from 1 through n for a group of n 
persons. 

F. Intelligence Measured by converting IQ scores to the ranks 1 

through n for a group of n persons. 

In each of the following instances, identify one or more correlation coefficients 
appropriate for describing the relationship between the two variables in 
question; 

a. Sex and height. 

b. Anxiety and intelligence. 

c. Sex and anxiety. 

d. Age and height. 

e. Sex and political preference. 

f. Political preference and intelligence. 

2. The following data are illustrative of data gathered by Kennedy, Van de Riet, 
and White (1963) and Terman and Merrill (1960). (Also see Jensen, 1968.) 
A sample of n = 100 ten-year-old children was drawn from the population of 
ten-year-olds. Fifty of the children were Negro, and 50 were white. The 
children were given both digit-span and vocabulary subtests. The following 
data were obtained: 


Race 


White 0) 
Negro (0) 


Digit span 
Fail (0) Pass (11 


25 25, 

29 2l~ 

54 46 


50 

50 

100 


White ID 
Negro (0) 


Vocabulary 
Fail (0) Pass (!} 


50 

50 

100 


a. Calculate the phi coefficient between race X and performance on the digit- 
span test Y. 

b. Calculate the phi coefficient between race X and performance on the 
vocabulary test Y. 
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c. Calculate the maximum possible phi that could occur in each of the two 
situations for the />*s there. Hint ’.p raet — .5 each time, but p<tn — • 

and PvocaMarv “ - 39 - , 

d. Compare the two values of $ calculated above and attempt to interpret 
the difference, both statistically and substantively. 


3. In a sample of 100 adults, the mean score on an object-assembly test is 104.00 
and the unbiased estimate of the variance of the scores is 256.00. The 50 
women in the group have a mean score of 100.00. Scoring females 0 and 
males l , calculate the point-biserial coefficient of correlation between sex and 
object-assembly ability. 

4. Data were gathered on a test of writing skill ( T) and dichatomously on verbal 
reasoning (X). The variable X was scored 0 for “below average” and 1 for 
“above average” ; Y was measured by taking raw scores on a 70-item test: 


Student X Y 


A 0 52 

B 1 52 

C 0 44 

D 0 55 

£ 1 58 

F 0 52 

G 0 61 

H 0 38 

l 1 53 

J 0 29 

K 0 40 

L 0 40 

M 0 45 

f* 1 59 

O 1 57 

P 1 50 


Calculate r» ( , as an approximation to the correlation between the ••underlying” 
verbal-reasoning scores and writing skit]. 

5. E^allon (9,11)51.0.1 ibn ft, rath, „( w r- is giv „, by 

f *. _ in, 

r *» urt • 


Y00 are lo, d .ha, », - M. „ - 50 .„d ... - Find ft, 

6 ' onftot! P '^°'? W ‘"" 1 * V” 1 ’ ‘ h "=P« joln.1, ranked r - 20 child™ 
ad,m.™„O V ■ r - OT<,,,0 " al nrljrr'lmcnt (1 _ besl adjustment, 20 - .Orel 
TO,™ a"'. ° f ““""“S " - 20 - "0,1 severe), 

parr, the rhid i ^ d,,t . mct P a,rl of ch,M ren in this group. In 60% of these 
parrs, the eh, Id ,n lhe parr „ ho had a higher rank on * ,|J, had a higher rank 
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on Y; in the remaining 20 % of the pairs, the child with the higher rank on AThad 
a lower rank on Y than the other child of the pair. What is the value of Kendall’s 
r for these data? 

7. For a sample of 89 children, Kabot (see Delacato, 1966, Chap. 14) gathered 
dichotomous data on reading performance X and laterality Y (consistency with 
which one “side" — eye, hand, foot— of the body is used). Data were in the 
form of judgments of poor (0) and good (1) reading performance and low (0) 
and high (1) consistency in the use of one side of the body: 


49 

40 

B9 


Assume that normally distributed variables underlie both dichotomies and that 
their joint distribution is bivariate-normal. Calculate the value of r ut for the 
above data as an approximation to the product-moment correlation of reading 
performance and laterality. 

8. The raw scores of 12 high-school students on tests of abstract and verbal 
reasoning are reported below: 


Student 

Abstract 

reasoning 

Verbal 

reasoning 

A 

40 

37 

B 

49 

42 

C 

44 

25 

D 

42 

40 

E 

24 

19 

F 

48 

39 

G 

36 

27 

H 

25 

14 

I 

45 

43 

J 

28 

16 

K 

31 

20 

L 

39 

35 


a. Convert the raw scores to ranks (1-12) for each variable and calculate r,. 

b. Using the same ranks generated in (a), calculate r. 

9. Themes written by ten pupils were judged to be either “creative" (X »* 1) 
* or “not creative" (A* = 0). A ranking of the same students on intelligence 
Y was available (10 •» highest, 1 *= lowest). Calculate the rank-biserial 


Reading performanc 
Poor (O) Good (1 ! 


High (1) 
Low (0) 


18 

3! 

26 | 

12 j 
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correlation coefficient from these data: 

Student 


,0 ‘ l",,* E - r ° U ? ° r fema,e P h y sica, <ducation students Leyman (1967) found the 
following intercorrelations of measures of grade-point average X, the Scott 
Motor Ability Test Y, and intellectual aptitude Z: 

x r z 


* i 

a - ra,,d gpa in ph3 ' 5i “' 

SSZl!? I” 0 ' ?!j h ' ™" i P 1 ' coemelBU R.,„ i.e., Ihe 

!° ?. .,. f ph)lieal GPA with the optimal linear combination 

oi intelligence and motor ability, 
c. Try to interpret these results. 

Ability TetMSClATl^ the a "’°"E «“ School and College 

■ad ACbi ' >em '”' (FT0A ^ 

Intercorrelations 
SCAT FTCAT CPA 


Of A 

r - . . 1 1.00 

• *”■ *— r. 

determine the weights 6, and h f nr k, ■ ^ ^ rGAT In the process. 

fth r eri °n's standard score on V fr ° the ,ea st -squares estimate of the 

, , ~ r , " Y from h,s star, dard scores on X x and X t . 

,z * The correlation or Y with X is < 3 t- rK. 

,hC correlation of with ° f K Wi,h U - 87 ‘ 010 
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probability 


10.1 

INTRODUCTION 


• , « attemotine to go beyond his data. Even when he 

The scientist a always 1 *« ,^ st S to gencra liz C , he makes tacit assumptions 
is most objective an stability- if he gathered more data tomorrow 

that his set of data »-£« “J** tJ f actly> the same trend. When 
it would reflet* appro wid n 'general inferences from what he sees today to 
least objective, he maxes , 6^ different condilionSi , omorr ow. 

what he will see d.fferent p aros, u The likelihood of 

Every ^“."^^ 00 , rush upon US, as when we hear: 
an inference s being " , omor row because I saw them beat the Sox 

••The Yankees »■>' h mo , e cogent: "Tve noted that the sun has 

today.” 0,b ' r, ” rer 'v“nast twenty years, hence it will also me tomorrow 
risen every day for t P (h y jr Iike |j h „od of being valid all the way 

morning.” In [ cren “ | .. to -almost certain.” By its very nature, no in- 
from E “*™'! y n „ bn valid, although some approach certainty. 
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Much of the work of the statistician is derivation of methods that assign 
probabilities to inferences. This is indeed an important occupation. In- 
ferential reasoning is the method of science; what a clumsy method it would 
be if scientists had no objective and systematic means of assigning prob- 
abilities to the inferences they make. The language of everyday life with which 
people refer to inferences as “extremely unlikely" or "almost certain" does 
not serve science adequately. These subjective estimates vary from person 
to person depending on the words chosen to express the likelihood and on 
what the words themselves mean. It is far better for communication among 
scientists that they can independently arrive at the same statement of the 
probability of the validity of an inference, and that the statement can be 
made in unambiguous terms carrying the same meaning for all scientists. 

We have idealized the actions of scientists only slightly in the above 
paragraphs to make the point stronger. Scientists and statisticians are not 
unanimous on the questions how to assign probabilities to statements and 
to which statements should probabilities be assigned. Nonetheless, although 
ey may construct adjacent but nonintersecting systems, their preference 
for one system springs from a value judgment about what the role of the 
S °“f h ‘ 'a !y! " mS are °I*n for all to sec. and anyone 

Ti w ' ! ~ °f “nderstandmg them may. Although statisticians do not 
toTnfi.nl ' ; hc 1 P ro P"" wh ‘» ) . methods or assigning probabilities 

probabThtv Tlatue T™™"? in ,his " xl d ' als with signing * 

years of age and t mat* ^ ,nt avcra S e ,s n °t random in children 1 1 
OWoutlf Z ITT™ l h,S S “' m ' nt wi,h Polity -59 that it it true.” 
convention! that conslittdTthe hT ! h '.' aSk ° r ,earain 8 thc mathematical 
cannot expect to l“m aJ aW , of Polity theoty. You 

for probability is a large and ^ , a 1 J” lty ln l ^ c space that is available, 
can Teant enough to makielem. 0 !^ ” body or knowledge. Hopefully, you 
gh make eiementary s ,a„ s „eal methodulogy meaningful. 

10.2 

PROBABILITY AS A 
MATHEMATICAL SYSTEM 

as a system of" definhim! iid"i!i. r T.T bitr! ‘ ly ' Probabili 'y will be looked at 
,d ' a ofa sa mP" tpace i, basic, w. S lo a */»«• The 

ever make a probability statement 
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that is not related to a sample space of some sort. Indeed, statements of prob- 
ability are simply statements about sample spaces and their characteristics. * 

A sample space is a set of points. These points can represent anything: 
persons, numbers, balls, etc. An event is an observable happening like the 
appearance of “heads” when a coin is flipped or the observation that a person 
whose name you have selected at random from a telephone book is watching 
television. There may be several points in the sample space, each of which is 
an example of an event. For instance, the sample space may be a set of 6 
white and 3 black balls in an urn. This sample space has 9 points. An event 
might be “A ball is white.” This event has 6 sample-space points. How 
many points in the sample space does the event “A ball is black” have? 
The event ‘‘A ball in this urn is red” has no sample points. “A ball in this 
urn is either white or black” is also an example of an event. Notice that 
many different events can be defined on the same sample space. 

A statement of probability is made about the occurrence of an event 
that is associated with a sample space. For the next few pages, we shall let a 
capital letter, A, B, C, . . . , stand for an event; the “probability of the event 
A ” will be denoted by P(A). 

Definition: The probability of the event A, P(A), is the ratio of 
the number of sample points (hat are examples of A 
to the total number of sample points, provided all 
sample points are equally likely. 

Let A be the event “A ball is white,” where the sample space is the set 
of 9 balls (6 white, 3 black) in an urn. How many sample points are examples 
of the event A? Obviously, the answer is 6. What is the total number of 
sample points? 9. Hence, the probability of the event A (“A ball is white”) is 

number of examples of A , 

total number of sample points * 3 

If B is the event “A ball is black” in our example, find P{B). Add 
together P(A) and P(B). In our example, what is P(C) if C is the event “A 
ball is red” ID is the event “A ball is either white or black.” What is 
/*(/))? E is the event “A ball is both white and black.” Find P(E). 

Suppose you have a second urn which has 4 white balls in it and an 
unspecified number of black balls. What is the probability of the event 
that a ball is white? You cannot know; at least, within the system we have 
developed so far, the question cannot be answered. A probability statement 

• The notion of a sample space is actually a relatively recent development in probability 
theory, dating back only to the !920's and the work of von Mises (1931). 
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can be made if you know fully fhe characteristics of the sample space, but 
in this example you do not. _ 

There exists an alternative route by which we may establish me oen- 
nition of the probability of an event. Consider a sample space composed 
of a countable number of elementary events. An elementary event is a sample 

point. Denote each sample point by a lower case letter V: a lt a <V 

Every event defined on the sample space is composed of a set of elementary 
events. 


Definition: A probability function is a rule of correspondence 
that associates with each event A in the sample 
space a number P{A) such that (1) P(A) > 0, for 
any event A; (2) the sum of the probabilities for all 
distinct events is 1 ; (3) ir A and D are mutually 
exclusive events, i.e., have no sample points in 
common, then P(A or B) = P(A) -f P(B). 


If we assume that the probability of an elementary event a, is l In. where 
n is the total numbeT of sample points, then the probability of the event A 
that is composed of n t sample points is 




+ ... + - 


Hi 


n n n n 

njn is the ratio of the number of sample points that are examples of A to 
the total number of sample points. 

Both routes bring us to the same definition for P{A). While the latter 
definition might have the preference of the mathematician, the former 
definition of P(A) will probably seem clearer to you. 


10.3 

COMBINING PROBABILITIES 


Suppose we have an urn that contains 4 red, 3 white, and 3 black balls. 
Three events might be of interest to us: (1) A, a ball is red; (2) B, a ball is 
white; (3) C, a ball is black. These three events are mutually exclusive: 
each sample point is an example of one and only one event. 

The question arises, “What is the probability that a ball is red or white?” 
We shall denote this event by the symbol A u B and its probability by 
P(A u B). v 

First Addition Rule of 
Probabilities 

When the events A and B are mutually exclusive, P(A U B). the probability 

of either A or B, is P(A) + P(B). ' r 
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In our example, 

P(A U B) = P(A) + P(B) = -fs + to = *• 

_ pfA) + 1 ’(C). What is the value of P(B u C)? 

Find P(A uq- P(A ) + ( > not be mutua n y exclusive, t.e„ 

In some sample spaces tw *. both cv£nts A and B . Let's 

a single sample point may ' -mils”) of flipping a coin three 

consider the p-** ^outcomes abstractly; as yet we don't 

warn to , a ar:bo„Uhe C physical act of flipping a coin. The eight possible 
outcomes are the sample space. 


1 . H H H 

2. HHT 

3. HTH 

4. HTT 


5. TTT 

6. TTH 

7. THT 

8. THH 


pro^l”!^" Wba ‘ 

" ^S/nrdl'e two°evem^ and 11 on the above sample space: 

A: “heads” on flips 1 and 2 
“heads” on flips 2 and 3 

nf even t A are the first two outcomes 
The sample points that are ex P ar e cxamp i cs „r event B. 

(numbers 1 and 2 above). 

Which two are they . denote the new event, "A and B. In 

We shall let the symbol A , on , and 2 and "heads on 

our example A l ' 1 ' B ' s ' ' ' e that all the eight sample points are equally 
picely! then the probabiHty'of the event ^ n 5 is as follows; 

numb er of sample poinl^ thataree xamples of A n _B 

p(A nB) = total number of sample points 

„ -I., r.rtioic is 8. Only one sample point is an 

The total number o fli j an d 2 and “heads” on flips 2 and 3. 

example of the event "heads on flips,. ^ ^ ^ n „ is 1/8 . 

Which one is it? Sc . th pr Ad& J on Rule of Probabilities and you » see 
Look back at the Firs Ad 5 are mu tually exclusive. In the 

the assumption no. mutually exclusive. The outcome 

example jus. discussed - 4 and^ ^ R ^ ^ definitions 

H H H was an b the probability of A or B, A Ll B, is when 

anVVare nor imituaHy exclusive. 
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Second Addition Rule of 
Probabilities 

The probability of either event A or event B or both, P{A U B), is 
P(A) + P(B) - P(A n B). 

This expression might look strange; a Venn diagram, an illustration of 
the relationship between events defined on sample spaces, should help clear 
up the mystery of the term “ -P(A r\ B)." See Fig. 10.1 where the events 
A and B, two overlapping groups of sample points, arc depicted in the sample 
space S. We’ll assume that the probability of event A is the area of circle A, 
and that the probability of event B is the area of circle B. 

The probability of A or B or both is that area covered by the intersecting 
circles vf and B. . The shaded portion in Fig. 10.1 is that set of sample points 
in both events A and B, i.e. f those points that are examples of A n B. 

How do we find the area covered by A and B? We first find the area 

° C ^ h “'! S not V' a<M 10 il ,hc a ™ ° r B "°> sta"* by A and 

then add the area or A and B: J 

P(A US) - [P(d) - P(A n B)\ + [/>(S) _ n + p( A n 

of rt T minl rS th.T ,enm °" lh ' righ ‘ Sidc ° f ,he abDve et t uat *°n give the area 
o » ZZ Z !" coramon w,,h B - Terms three and four give the area 

find by add nV TZ ^ Wc com P ta ' ,hl: ™ ™ "> 

“attL^pfifieVto' ar “ COmm °" " a " d B - " B >- ^ above 

P(A U S) = P(A) + />(S) _ p(A n B) 

portion in clrntTa^ TpU ^ 'S? ' d “ ™ B >- «« 

sum. It must contribute only dice so fn ’ wou,d c ° n,r,but c twice to the 
once. ^ ’ so con sequently it must be subtracted 

rlf 



FIG. 10.1 Venn diagram of ihe 
intersecting events A and B in the 
sample space S. 
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FIG. 10.2 Venn diagram of the 
mutually exclusive events A and B in 
the sample space S. 

In general, P{A U5) = P{A) 4- P(B) ~ P(A n B). If A and B are 
mutually exclusive, then P(A nfi) = 0. Therefore, if A and B are mutually 
exclusive, then 

P(A U B) = P{A) + P{B) - 0 - P(A) + P(B). 

So far we have treated probability rather abstractly. In much the same 
manner as the mathematical systems of geometry and algebra, probability 
theory can be developed From a small set of axioms and definitions. But 
also in the same manner as geometry and algebra, probability theory can 
serve as a model for what is going on in a certain class of events in the world 
around us. Geometry has applications to surveying and architecture. The 
numerous applications of algebra are obvious. Probability theory also 
applies to a certain group of events. 

The principle that relates probability statements to physical events is 
due to James Bernoulli (1654-1705). An example of the application of a 
formal probability statement to an actual set of actions should suffice to 
show the relationship between theory and application. 

Suppose we have an urn that contains 4 white and 6 black balls. The 
balls are identical in size, shape, and weight and thoroughly mixed so that if 
we were to reach in and pull one out, it is equally likely that any one of the 
ten balls would be selected; each ball has one chance in 10 of being chosen. 
We reach into the urn, pull out a ball, and record its color. The ball is 
returned to the urn, the balls in the urn are stirred thoroughly, and the act 
is repeated under the same conditions. We perform this act a very large 
number of times, say 10,000, After the 10,000th drawing of a ball, suppose we 
count the number of times a white ball was drawn. Your intuition would 
tell you that the ratio of the number of times a white ball is drawn to 10,000 
will be very close to 4/10. The ratio will very probably not be exactly 4/10 
but it will be close. 

If we regard the 10 balls as a sample space, and if we say that A is the 
event “a ball is white,*' then P(A) is 4/10, exactly. The question arises, 

Will the formal probability of an event as calculated from theory correspond 
closely to the relative frequency of the occurrence of the event? The answer 
to this question is the key to the relationship of probability theory and its 
application, and the answer is “yes.” 

We shall attempt a formal statement of this relationship. Suppose an 
event A cither does or does not occur on every trial of an act. The prob* 
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ability that A will occur is the same for all trials of the act and equals P(A). 
For example, the act may be flipping a symmetrical coin, A may be the event 
“heads,” and it is assumed that the probability of heads, 1/2, is the same 
from one flip to the next. It is also assumed that every trial is independent 
of (in no way affected by) every other trial. Now after n trials of the act, 
the proportion of times A has occurred is p. It can be proved (the proof is 
so difficult there is no point presenting it here) that p gets closer and closer 
to P{A) as n becomes larger and larger. We can make the proportion of 
times A occurs as dose as we want to P(A), the probability calculated from 
the sample space, by performing the act enough times. So P(A) tells what 
will happen in the long run if we actually performed the actions under the 
conditions laid down above. 


The preceding paragraph is a verbal statement or the law of large 
"ambers The law 0 r large numbers is important for the application or 
probabtlny and since stalistics is one such application, it is important for 
statistical Inference. In spite or its importance, we shall have little more to 
say about the law of large numbers. 


Multiplicative Rule of 
Probabilities 


Stable taonria T ™-' n,lc ' 0T t’ r ° babili ti'* that will be or con- 

“' r. » “ S J 31 " WO ' 1 ' S “ PP ° Se arc fli PP in 6 3 coin 

flip andtfiat the flhis . 7 a ,h ' P" >tabil “r of "heads" is 1/2 on each 
s«.“ to. h^mohVhr! T nd 11 “ '»“%««■«« rule for probabilities 
/2 1 2 - ,m P A a y 1 eC, ' ing fiVt S,n,iph ' " h ^" « 1/2- 1/2 - 1/2 - 
1 1 l,3Z Ae'neral statement of the rule follows: 

— ” » n ln4,lI,Z7s T S °" y “ 

"*■ —toped so far. 

one of the 6 faces comm ^ ^ ™ ^ «- Probablhty of any 

•ha. an even number ^ ^ “T* ^ “ ‘ b » probability 

eleven number) = P( 2) + P(4) + p ( 

Suppose w e consider^ tworolhofthe ”“ ml S ‘ a PP“ r 'ng on a single toss? 

ic. The sample space of possible 
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FIG. 10.3 Sample space of out* 
comes of tossing a die twice. 


1 . 1 

2, 1 

3, 1 

4, 1 

5, t 

6,1 

1,2 

2,2 

3, 2 

4, 2 

5,2 

6, 2 

1,3 

2,3 

3, 3 

4,3 

5, 3 

6,3 

t,4 

2,4 

3,4 

4,4 

5,4 

6,4 

1,5 

2,5 

3, 5 

4, 5 

5, 5 

6,5 

1,6 

2,6 

3,6 

4, 6 

5, 6 

6,6 


outcomes has 36 points, as shown in Fig. 10.3. Let event A be “a 1 on toss 1” 
and event B “a 2 on toss 2.” Find P(A n B) by dividing the number of 
sample points that are examples of A n B (both A and B) by 36. Verify 
that P(A C\ B) is equal to P(A) • P(B). Find P(A U B), ihe probability of 
event A or event B, remembering that P(A U5) = P(A) -f- P(B) — P(A n 
B). Here, this is 6/36 + 6/36 — 1/36 = 11/36; the point 1, 2 is common 
to A and B. 

We shall accept as a definition the statement that two events are inde- 
pendent if and only if P{A = P(A) * P(B). Independence is an im- 
portant concept in statistics and probability, and we shall have more to say 
about it later. 


10.4 

PERMUTATIONS AND 
COMBINATIONS 

Two additional concepts that crop up repeatedly when one begins examining 
the outcomes of experiments are permutations and combinations. 

A permutation of a set of objects (the letters A, B, C, and D, for example) 
is an arrangement of them in which their order is considered. A different 
ordering of the objects is a different permutation. How many different 
permutations (orderings) are there of the letters A, B, C, and D1 To find 
out we can set about the task of writing them down and counting them, as 
shown in Table 10.1. 

The first letter can be either A, B, C, or D. Suppose it is A (this puts 
us in the top fourth of Table 10.1). If the first letter is A, the second letter 
can be either B, C, or D. If the second letter is B, then the third letter can 
be either C or D. If the third letter is C, then the fourth letter must be D. 
So ABCD is one possible permutation. There are four possible letters for 
the first position; after one letter is assigned to the first position, there are 
three possible letters for the second position; etc. Hence, the number of 
possible permutations of the four letters A, B, C, and D is 4 • 3 • 2 • 1 =24. 

If we have n distinct objects we can make n{n — })(n — 2) . . . 2 - I 
different permutations of them. Instead of writing n(n — I)(n — 2) ... 2 • I 
we can denote this product by nl, read “n factorial.” n! is the product of 
the numbers from 1 through n and equals the number of permutations of n 
distinct objects. (We agree to Jet 0! equal 1.) 
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It can be shown fairly easily, and you should satisfy yourself that it is 
true, that 

«(« - l)(n — 2) ... (n -«, + !) = - 


(n - n,)l 

merely write «! as 

n(n - l)(n - 2) ... [(n - nj) + l](n - ni )[(„ — nj — I] . . . 1. 

Write out n! in the numerator and (n — nj! in the denominator and cancel 
the terms that are common to both numerator and denominator and express 
(n — — « t ) — 1] ... 1 as (n — ni)! Then perform the division. 

Consequently, 

(- 1 . . 

W n.!(n — n,)l 

To summarize, the number of permutations of n objects is n 1. The number 
of permutations of n objects taken n t at a time is 

n(n — 1). .. ( n — „ t + ]j 

Tht number of combination, »/„ thing, taken n, at a lime I, given by 


(")=— iL . 

\«i/ «i!(n — nO! 


or 3 lhi "p «» 5 


0 - 


51 

3! (5— 3)! = 


51 
3! 2! 


5 • 4 • 3 • 2 • 1 


5-4 
* 2 • 1 = 


20 

3-10. 


(3 - 2 - 1)(2- l)° 

Note that there are 5 ■ 4 • 3 ao _ 

because each of the lOcomhinnf pern,u ‘ al '°" s of 5 things taken 3 at a time 
permutation, l0 “"»>,„,„„ n5()r5 lhings lakcn 3 at a Ume ^ 3 . 2 . , 

composed of only seven ^" 0 °" 3 commutcc - The committee must be 

formed from the 10 available men^lT" 7 d lJ erent committees could be 
(I0j auable men? Let „= 10 and = 7; then evaluate 


Can sell them individually 'he cm m' ! “ din ' ren ' , flavors ° r cream. He 
mix any three difTerent flavors in r 3n ^ two dlfr crent flavors, or he can 

" nvors (and combination, of h "™ "T" H ° W m3 "T 

seoaraien ■■ " a «i'x) can he make ? You must evaluate three 

separate quantities of the f orm (”^) 10 10 | v e this problem. 
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Practically every topic discussed so far in this chapter can be applied to 
the solution of a problem in statistics. The following problem concerns 
the repeated independent performance of an act that can result in either 
“success” or “failure” with a constant probability. We shall develop a 
binomial distribution that will describe one aspect of the results of such a 
series of acts, namely, the probability of a given number of successes. 

Suppose a fair coin, i.e., symmetrical and of homogeneous composition, 
is to be flipped five times in succession. Each flip is independent of every 
other flip in the sense that if the first flip results in “heads,” then a “heads” 
on the second flip is no more nor less likely than if the first flip had given a 
“tails.” Assume further that the probability of “heads” is the same from 
the first to the fifth flip. The five flips are five “trials”; independent trials 
that can result in one of two outcomes with a constant probability are called 
Bernoulli or binomial trials. (James Bernoulli was a 17th-century mathe- 
matician whose Ars Conjectandi was one of the first treatises on probability 
theory.) The five coin flips are five binomial trials. How does one find 
the probability that three “heads” will result in five trials? 

One possible outcome of the five flips that gives three “heads” and two 
“tails” is H , H , H, T, T. From the multiplicative rule of probabilities, we 
know that the probability of H, H, H, T, T is 1/2 • 1/2 • 1/2 • 1/2 • 1/2 = 1/32 
since the probability of “heads” and the probability of “tails” are both 1/2. 
However, the sample space of outcomes of the five flips has 32 points 
(2 • 2 ■ 2 ■ 2 • 2), and several of these outcomes are examples of the event 
“three ‘heads’ and two ‘tails’.” We can find the number of different points 
that are examples of the event in question by using the concept of com- 
binations. Further examples of the event are U, H, T, T, H and H, T, T, 
H, H. The total number of distinct outcomes that have three “heads” and 
two “tails” is the number of combinations of five things taken three at a 
time, 


( 5 )= — = 

W 3121 


4 • S _ 
2- I ” 


There are 10 sample points that are examples of the event and each has 
probability 1/32; consequently, we use the addition rule of probabilities so 
that the probability of obtaining three “heads” in five flips of a fair coin is 

10(1/32) = 10/32. /5\/lY/lY 

In summary. 10/32 was found by multiplying out [ } J {-J [-f . 

Jn general, the probability of obtaining n t “successes” in n binomial 
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trials is equal to 




where p is the probability of a “success” on any one trial, 

q is the probability of a “failure” and equals 1 — p, and 

= 0,1,2 n. 

Note that “binomial trials” refers only to trials that have only two 
possible-outcomes : yes or no, success or failure, heads or tails, red or not red, 
etc. 

„ . A dle ls tosscd four times . and we wish to find the probability of three 
thii =Moiple, o = 4, n, = 3, and we lake the probability of a 
S', onanytnal.p. ,o be 1/6, Henee, , « 1 _ p _ 5/6,, he probability 
of nor a 6. The probability of 3 “sixes” thus equals P 

_ -ii = il5 _20_ 

ifl/ 3! 1 1 \6/ \6/ 6* 1296' 

prababi,ity ° r ,hr “ “ sira " - <■ 

The expression 


probability (n.) = * 


.™ s d “ ■“ 

An interesting relationship exists between the tern, (‘ " ) in the binomial 
matician and |ddtal5b^Se'f»S‘o^*r|^? f ^ lh h e Fre " ch "“‘ h f 

above it. Thus, the 6 in row 4 is the sum of?h ‘ h ' T I1 “ mb " s diagonally 
easily construct the eiehth, „i„ lh tenth and^ £T° 3 * ab ° Ve it V °“ COul11 
„ „ . / _v ’ th * and sequent rows yourself. 

— 

number in row 3; ' W q “' S ,ht !ra>nd i 


(o) 


0- 


3? 

I ! 2! = 


»ES, taken from 5 . N „, e ^ ^ ^ ^ ^ ^ 
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Row 

1 

2 

3 

4 

5 

6 

FIG. 10.4 Pascal’s triangle. 7 


1 

1 1 

1 2 1 

13 3 1 

1 4 6 4 1 

1 5 10 10 5 1 

16 15 20 15 6 J 

1 7 21 35 35 21 7 1 


Calculating, for example, the probability of 14 successes in 30 trials 
when p = 3/8 and q = 5/8 would be an arduous task. Fortunately, excellent 
tables for the binomial distribution exist. We recommend you use them 
whenever your calculations are more complicated than the elementary ones 
in this chapter. A brief table for the binomial distribution appears in Pearson 
and Hartley (1966). Much more extensive tables have been published. See 
Staff, Harvard University Computational Laboratory (1956) and “Tables of 
the Cumulative Binomial Probabilities” (1953). (The “cumulative binomial” 
is simply the sum of successive terms in the binomial distribution and gives 
the probability of at least successes in n trials.) 

Figure 10.5 illustrates the entire binomial distribution for eight trials 
when p —q = 1/2. Note that when p = q the distribution is symmetrical. 
For p > q it will be skewed to the left because high values will predominate. 
How will it be skewed for p <ql 

In Fig. 10.6 the binomial distribution for n — 5 and p = .30, q = .70 
appears. You can see from Fig. 10.6 that the probability of five successes 
when n = 5 and p = .30 is very small. It is only (.3 0) 5 = .00243, i.e., 243 
times in 100,000. If you watched five trials of some event and you had no 
idea what the probability of success was (p was unknown), and you observed 
five successes, would you think it likely that p = .30? What values of p 
would make five successes in five trials likely— say, would yield a probability 
of attavsY .51 (Him.-. Solve p s =- .5 for p. Use logarithms, if you know how. 
Otherwise, try various large values of p. Then find p s > .5.) 

The binomial distribution affords a good example of the general type 
of reasoning used in testing statistical hypotheses. A statistician may feci 


number in each row is I ; because 


9 


, 


• , 1 

[ 

- , T M 

1 i i . 

0 12 3 4 

5 6 7 8 


Number of successes 


FIG. 10.5 Binomial distribution for 

n~S,p~ 1 / 2 . 
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FIG. 10.6 Binomial distribution for n = 5, p » JO. 


J^r° C “ S , hy - T hi ' h ' he data he ha! °>«rved «re generated war a 
nine “tiMd 10 ™ u ^ ^ Supposehe has flipped a coin ten timer and observed 
“ S “ S P ,C “>“ S Ihat the coin is not fair, i.e., the probability 

“he coinfr IT “PP<*r to be evidence that 

•he com ,s unfair. He formalnes his evidence in the following manner: 

would hav^observed f h C3dS * ,S V 2, what is Ihc probability that I 

morieMrrme^ad^r' ' ^ '‘ h ' 3ds ”> “ 

3. This probability is 

( » ) (2)' (jJ + (Jo) (j)T(jJ = .0107. 

4. !f R lhir'l ,ha ' T " l,mb "'* i,h a " exponent of zero equals I.) 

5- If this coiiTis'b' IlaVaobsCTVcd an cvcnl that is eztremely unlikely. 

the event I observed 

C^W-POTJ- 2 "-- 

that this coin is fain'-fhe^"''', 01 ?'"'*'! 0 " 5 it is very unlikely 
he com appears to be biased in favor of “h^ad^"'” “ r 'j“ 1ed: 


flips of 


Notice that it is 


a Coin; in ranhwi”' h"idd '"’ po ' s,bl ' 10 obtain nine "heads” in 10 
should ezpect „ , 0 happen abow „„„ ( , 07) 
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in every 100 sets of 10 flips. The possibility exists that the coin that yielded 
nine “heads” actually is a fair coin, but the probability that this is true is 
small. 

The experimenter’s logic often follows the same pattern as the reasoning 
above. He hypothesizes that certain features of a model for his experiment 
(the binomial distribution is one such model) have certain values (e.g., the 
probability of “heads” is 1/2). He needn’t believe this hypothesis; he only 
entertains it temporarily to see where it leads him. The experimenter then 
takes his observations (e.g., flips the coin 10 or 20 times), after which he 
calculates the probability of obtaining the actual result he did obtain, or 
a more extreme one, if his original hypothesis was true. If this probability 
is very small (say, .05 or .01), he questions the truth of the hypothesis. 
If his original hypothesis was false and some alternative hypothesis was 
true (e.g., the coin is biased), his results might be much more likely. This 
process, called “hypothesis testing,” is one of rejecting or failing to reject 
some explanation (hypothesis) for the obtained results on the basis of the 
probability of obtaining the particular results if the hypothesis were true. 
If the probability of the obtained results (or more extreme ones) is large 
(for example, .30 or more) when one calculates this probability under a 
certain hypothesis, then the hypothesis is not rejected at that moment. 

If the probability of the obtained results (or more extreme ones) is small 
(.05 or less) when this probability is calculated under another hypothesis, 
then this hypothesis is rejected. It has not been proven that the hypothesis 
is false; there is some small probability (.05 or less) that it is true. However, 
the evidence for its falsity is great. 

The study of probabilitycan be interesting and entertaining. Historically, 
probability concepts arose in connection with games of chance. Those who 
make use of probability theory are generally awed by the intricacy and 
excitement of the system and the way in which it produces results quite in 
disagreement with intuition, unless one’s intuition has been developed by 
experience with calculated probabilities. A few examples of surprising 
results will convince you of the untrustworthiness of your intuition at this 
stage. 

What is the probability that at least two people in a group of 23 have 
the same birthdate? (Assume that the people are drawn randomly from a 
population of persons in which all 365 birthdates (not counting February 
29) are equally likely.) Is the probability .001 or .0001 or even smaller? 
Actually, the probability that at least two people out of 23 have the same 
birthdate is .507! You should expect multiple birthdates to occur in slightly 
more than half the groups of size 23. It is practically certain that in a group 
of 150 persons at least two people will have the same birthdate! (See Feller, 
1957, pp. 31-32.) 

Suppose John and Jim have a fair coin (probability of “heads” is 1/2) 
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educational and psychological research. Careful study rant * om " 

" f — T pli i 

s «a.es that a, each 

have the same probabi 1 y g choice is to be random, 

selected at random from a < feck of ^ choscn , namtly 1/52 . 
then all 52 cards must ha \ al l 0 f the 51 remaining cards must 

If the second choice is 0 ’ name Iy 1/51. How do you achieve 

have the same P roba ‘" 1 '!f ° f . be '1| ctice a 'jhere are numerous ways in which 
this goal of equiprobability ^ much getter than others. Prob- 

equiprobabihty is approximat > k d bridge assume that 

ability statements about various hands in poher B of 
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"fiUoTs, m thraumberaf any Cher person who has already been seleclc . 



sec. 10.7 


RANDOM VARIABLE 2IS 


studies at the master’s and doctor’s level and many others that are published 
in journals stumble on this basic problem. Examples of proper and improper 
analyses will be given throughout this text; you are best advised, however, 
to seek expert advice concerning your sampling technique when you are 
designing an experiment. 

A common fault of naive researchers in education is to choose “class- 
rooms” randomly to participate in an experiment and then to analyze the 
data as if “students” had been chosen randomly. If classroom A is chosen 
for method A of teaching and classroom B for method B, then surely the 
selection of “students” has not been an independent one. If John and George 
are both in classroom A, then John and George must both receive method A. 
There’s no chance for John to receive A and George to receive B. Any 
analysis that treats the 30 students in both A and B as though they con- 
stituted 60 separate observations is quite likely to be wrong because the two 
classrooms are just two “clusters” of students. The average of the 30 
observations in A may be treated as independent of the average of the 30 
observations in B, if other conditions are met. Hence, one has two obser- 
vations that constitute a random sample. This is a difficult point to com- 
prehend. The general problem assumes a slightly different form for all of 
its special instances. The proper method of sampling and analysis sometimes 
manages to elude the most sophisticated researchers. We hope to throw 
more light on the problem later in this text. 

We conclude this section on random sampling with an alternative 
statement of the definition of simple random sampling. Try to satisfy 
yourself that this statement is equivalent to the one given at the beginning 
of this section. 

Simple random sampling: If sampling from n observations is 
random, then regardless of what the first u, choices were, the 
probability of any particular observation's being chosen on the 
(«j + !)■« selection is 1 /(« — «,). 


10.7 

RANDOM VARIABLE 


A random variable is defined in terms of two concepts, one of which has 
already been introduced: sample space and function. We have already 
discussed the meaning of a sample space. It is a collection (finite or infinite) 
of objects or events. A function is any set of ordered pairs of elements, no 
two of which have the same first element. As you can see, the definition of 
a function is quite general; ((o, 1), (*. 2), (c, 3)} is a function, so is {(John, 
Alice), (Joe, Mary), (Ted, Sharon), (Jim, Joyce)}. A function is formed when 
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v,e Specify a rule that associates a unique element with every element of 
some set. We form a function when we associate every person with his age: 

(Mr. Jones, 37), (Mark Smith, 5), A random variable is a function 

such that all of the first elements are points in a sample space. The previous 
example is a random variable. The sample space is all persons; each is 
associated with his age. 

The following are all examples of random variables (the random variables 
of most interest to statisticians and to us in this text are those in which the 
elements of a sample space are associated with numbers): 

1. Sample space: outcomes of the flip of a coin (“heads,” “tails”). 

Random variable X 
1st element 2 nd element 

“heads” 1 

“tails” 0 

2. Sample space: the 6 different outcomes of rolling a die. 

Random variable Y 
1st element 2 nd element 

the face with 1 dot 1 

the face with 2 dots 2 


the face with 6 dots 6 

3. Sample space: a bushel of oranges. 

Random variable Z 

2 nd element ( the weight of 
Is/ element the orange in ounces) 

Orange #1 3 

Orange #2 5 

Orange #3 2 


bo» T „. ,, lbe co " ruii °“ ca - 

II the random ’.Tmb'cal MndhOl .°b ** ™ ndom variable Z " Ir 2 

"eight in ounces, for each element «r J l * bove * * hen 2 assume s a value, the 
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You have already met a random variable in this chapter, but no attempt 
was made to point it out. The binomial distribution is that of a random 
variable. The sample space is the collection of points that are the distinct 
outcomes of n binomial trials. The random variable X takes on the values 
0, 1, 2, ...» 7i according to the number of “successes” that occur in n trials. 
If a coin is flipped four times and a “heads” is called a “success,” then X 
has the value 3 for the event H, T, II, H, since 3 “successes” occurred. 

A sample space can become complex conceptually. For example, a 
child is to read a given page of a reader. He is to do this a large number 
of times; each separate reading of the page is a sample point in the sample 
space of all readings. A random variable X can be defined on this sample 
space by associating with each reading the lime from beginning to end. 
X takes on a value for each reading: reading 1 — 3 min 5 sec; reading 2 — 2 min 
48 sec; .... X is a random variable that takes on values expressed in 
“minutes”and "seconds.” 

In subsequent chapters we shall see how probabilities are associated with 
values of a random variable. For example, suppose a fair coin is being flipped. 
The random variable X is the S3me as example (1) above. The probability 
that X will take on the value 1 is 1/2, the probability that the event “heads” 
associated with 1 will occur. The probability that the random variable X 
will assume the value 0 is 1/2. 


10.8 

TYPES OF RANDOM 
VARIABLES 

The statistician finds the distinction between discrete and continuous random 
variables a useful one. We have had no need to use this distinction yet; 
it becomes a useful one when certain operations on random variables are 
• defined and performed. 

The distinction between discreteand continuous refers to the nature of 
the numbers that are associated as second elements of the random-variable 
function with sample-space points. A discrete random variable is one that 
can assume only certain values on the real-number line . We can think of all 
real numbers as points on a line that extends from — co to +co, as in Fig. 
10.7. Between two points on the real-number line, a discrete random variable 


- — to -to -2 -1 o 1 2 to +a> — ► 

FIG. 10.7 The real number line. 

can assume some values but not others. An example of a discrete random 
variable is X in (1) in Sec. 10.7 above. It can assume only the values 0 and 
1 ; it is impossible for X to equal 1/2. A continuous random variable is one 
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that can assume any value an the real-number line between l»o paints 
example of a continuous random variable is age. The value of the 
variable "age” for a person can equal 5 years, 6 months, 4 days 11 hours, 
14 minutes, 6.132 . . . seconds. A person who has just become 10 years old 
has possessed an age equal to every possible number on the real number line 

from 0 to 10 years. . , 

Which of the following random variables are discrete and which arc 

continuous? 


1. Number of “heads” in six flips of a coin. 

7. Time required to solve a concept formation task. 

3. Height of mercury in a barometer. 

4. Highest temperature of the air during the daylight hours. 

5. The number of teeth in an infant’s mouth. 

6 . The amount of money in the pocket of a corporation president. 

Variables 1, 5, and 6 are discrete; .the others are continuous. Can a 
corporation president have 11.5 cents (not counting trading stamps) in his 
pocket? Can a subject require 51.23 seconds to solve a concept formation 
task? Can he require 46.721 seconds or 38.50 seconds? 

A distinction must be made between the values that a random variable 
can assume theoretically and the values that one’s measuring instruments 
)ield. A variable such as length can assume theoretically any possible real 
number between 0' and S', say. For example, analytic geometry tells us 
that the hypotenuse of a right triangle both sides of which are l' in length 
has length yfl". The number -Jl is an unending number the first few digits 
of which are 1.414 .... No physical act of measurement will yield a value 
exactly equal to -Jl. The most sensitive measuring instrument must finally 

give up m its attempt to add more figures after 1.414 Perhaps a finely 

calibrated tuIm could give lengths to the nearest l/128th of an inch; if so, 
length as measured by this ruler could not assume a value between 6/128' 
and 7/128 , for example. Although “length” is theoretically a continuous 
random variable, any measurement of it yields discrete values. Nonetheless, 
it will be helpful to retain the distinction between discrete and continuous 
while remembering that the physical act or measurement yields discrete 
numbers. 
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PROBABILITY as an area 


The probabilities of observing values of continuous variables, e.g., height, 
AmTfw’r'c repreSCT]ted b y mathematical curves known as probability 
J- Suppose we have a continuous random variable X that can 
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FIG. 10.8 The probability density function of the variable X, time 
required to solve a puzzle. 


take on values from 0 to 10. For example, X could be the time required 
for subjects to solve a certain puzzle. They may solve it almost .mmed.ately 
or they may take as long as 10 minutes but no longer Presumably the length 
of time required to solve the problem is known for a huge number of difleren 
subjects. A graph is drawn in which the "time to solution a graphed against 

“proportion of subjects requiring that time (see Fig. 10.8). 

P The proportion of subjects requiring between 2 and 4 minutes to solve 
ca n be regarded as the probability that a subject selected at random 
r P , . ---■■>■• between 2 and 4 minutes to solve the puzzle. 

probability inai j , 08 wha , area corresponds to the 

is equa 1 to the f ad ' d “ |v s elecled subject will take less than 0.5 minutes? 
probability hat a exac „ 4 minutes , say, 

(Theoretically, the probability ol asn^ ^ ^ 4 (he arca under 

is zero because there and were 07 _ then in a group of 100 

randomly chosS subjects we would expect about 7 of them to take between 

6 “The smUtWa'n fmquentfy pWs'ihe values a continuous random variable 
The statisli M ^ ^ area b5twEcn any , wo values of the 

can assume in such ? that lh[ . variable will assume a value between 
variable equals the p * is c3 ]led a probability density function. 

those two 'values. _ Th ‘ J a mathematical f uncl ion in such a way 

The graph can found by substituting any value of the random 

that the ordinate TO «n be loun y , hal can take on 

variable X For ^'^he^Jprobahitity. If we let P(X) = 1/2 for 
will be .he probability density 

function of. X. of the rectangle in Fig. 10.9, is exactly 1 

,0 5 ^ 0 ) The shaded area is the probability that X takes on a value 
between 0 and 1. What does this probability equal? 
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FIG. 10.9 Probabi lily density func- 
tion of the variable X that assumes 

all possible values between 0 and 2 

X wiih equal probability. 


10.10 

EXPECTATIONS AND 
MOMENTS 

Moments are characteristics of distributions defined in terms of expectations. 
We shall consider the definition of the expectation of a random variable X 
first. 


Definition: If AT is a discrete random variable that takes on the 
values X u X t , . . . , X n with probabilities p u p t , . . . , 
P*> then the expectation of X denoted by E{X) is 
defined as 


E(X) - p.X, + p,X, + . . . + p ,y, =ip,X„ 
wher tp,+p, + ...+p n =\. 


E ,oup C d“ auT Simi ' lr ' hi! '• * 3 fOT — P“«"S .ha mean from 

not ror samp,es - 

.he mean 0 «* 

' The maternal, call, rophisticared reader n , „„„ 


n('-' X ') 


l-IWl-l. 

.V-.„ * 

■ ... the mean or He ,nS„„e nombe, of x ,, ^ 

'’“.I™. (*)• 

tepropOTtionofall thenane. ihararo .... B 

tehlT' i'r"*"'" ^ °"" r 10 He oMhe » •[“T’ r ‘- 

f. -5 for one random Rep of an unbfaard noli * * ** ,h '° retl “' «al"e% 
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The names "expectation” and "expected value” are synonymous. Some 
examples of expectations are as follows: 

1. Suppose .V is the random variable that has 6 possible values, 
1 2 6 The events of the sample space could be the 6 sides 

of a die. ' Assume that a probability of 1/6 is associated with each 
value of X. What is the value of £(*)? 

£(JO = il+i2+i3 + i4 + |5+i6=i(l+2+... + 6) 

= V = 3.5, 

halfway between the smallest value, 1, and the largest value, 6: 
1_±_6 
2 


■ = 3.5. 


In this example, E(X ) = /. = 21/6 = 3.5. In repeatedly rolling the 

die, one can "expect" to average 3.5 points. 

2 A particular slot-machine has payoffs of S0.00, S0.50, S1.00, and 
The probabilities associated with each of these occurrences 
]« 04 and .01, respectively. Define a random variable 

T that’ take's on ?he our values 0, 50, 100, and 200 cents with 
LbaWlities -80, .15, ,04, and .01. What is the value o f WO 7 
„ _ r(X ) = 80(0) + .15(50) + .04(100) + .01(200) 
h = o + 7 5 + 4.0 + 2.0 = 13.5. 

If it costs 50.25 for each trial on this slot-machine, would you like 

1 Til *= random variable that corresponds to the number of 
"heads" in 4 flips of a fair coin. 2T can take on the values 0, 1, 2, 3, 

and Fim'we must calculate the probabilities that y will take on 
JV.Z va |„es from 0 to 4, i.e., the probabilities that there 
wifl be 0 "heat" . . . 4 “heads" in 4 flips of a fair com. 


Probability (X = 0) — (q) (2 


Probability (X — 0 — 


4 

_ 014! ' 
4! 

“ 113! 


, ! . _L — JL 
16 16 


_4 

= 16 = 


Tf vou have forgotten how to compute the remaining three prob- 
abSs refer blck to the relevant sections of this chapter. Then 

^^ctu'anfone^nTed'not compute the mean of a binomial 
a- , this long way, because that mean is always merely np, 

d we n Ts the nu°^r of trials and P is the probability per trial. 
HerTrl p = 4(1/2) = 2, which agrees with the result computed above 

via the 2 p,X, formula. 
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Mow docs this result differ from that for example 1 , above? If we had 
wanted to know the average value that would be obtrttned m repeated^ 
ginning an unbiased coin, where T = 0, H = l, and p T - p„ - 1(2. we 
would have computed 1/2(0) + 1/2(1) = 1/2. not the 2 found above for a 

d ‘ ff ' Analogously! if in example 1 we had wanted to know the number of, 
say 5’s to be expected in rolling an unbiased die 4 times, we would have been 
working with the binomial expansion (1/6 + 5/6)<; 1/6 is the probability 
of securing a 5 on any roll of the die. The number 5 can occur 0, 1 , 2, 3, or 
4 times in 4 rolls of the die. The answer would be 

Pa ( o) + *(1) + piV) + p»(y ) + />•(*) = «(4) = 2/3, 


not the 3.5 found in example 1 for another problem. 

If we rolled a die 4 times, and then 4 more times, etc., until we had rolled 
it 6 sets of 4 times each, we would expect a total of (2/3) (6 sets of rolls) = 4 
fives to appear. This is reasonable, for we would have rolled the die inde- 
pendently 4(6) = 24 times, each time with a probability of 1/6 of rolling 
the number five; 24(1/6) = 4. 

Obviously, one must state his probability problem carefully and then 
solve ft, rather than a similar-sounding but different problem. This can be 
quite tricky, as even some eminent mathematicians have regretfully learned. 

If X is a continuous variable instead of a discrete one, then an algebraic 
function describes the form of its probability distribution. As we saw earlier, 
if X is continuous we cannot assign a probability to a single value or X. 
Instead, we make statements about the probability that Xlies in an interval. 
For these reasons, the definition given above for E(X) cannot be applied 
to a continuous random variable. Unfortunately for those who have no 
knowledge of calculus, there seems to be no sensible way to define the 
expectation of a continuous variable without recourse to the calculus concept 
of integration. We shall attempt to sketch heuristically the idea of E(X) 
when X is continuous so that those without a knowledge of calculus will not 
be at any disadvantage in our later discussions. 

Suppose AT is a continuous random variable and the probability dis- 
tribution of X looks like the one in Fig. 10.10. There is an algebraic rule 
that gives the height of the curve in Fig. lO.lOforevery value ori Theatea 
under the curve is 1 unit. The probability that X will assume a value 
betneen, for example, 2 and 3 is equal to the area under the curse between 



FIG. 10.10 Probability distribution 
of X. 
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Definition : The expectation of the continuous random variable 
X is the sum of the products formed by multiplying 
each value that X can assume by the height of the 
probability function curve above that value of X. 

Since X can take on infinitely many values, you might wonder how you 
could physically multiply each of the separate values of X by the height of 
the curve at X to find its expectation. This is the problem that recourse to 
the integral calculus solves. We ask that you take it on faith that it can be 
done, in a precise but somewhat indirect way, by “integration.” 

The expectation of a continuous random variable X is denoted by E(X) 
or p, as is the expectation of a discrete variable. 


Moments 


Moments are quantities that describe the distributions of variables. 

Definition: The first moment of a random variable X is p, the 
expectation of X. ■ 

The first moment is also called the “population mean.” p, the first 
moment of A', describes the general location of the distribution along a line. 
For some distributions (for most met in practice), E(X), or p, is a good 
indicator of the central point toward which the values of X tend. Suppose 
ATand Tare both normally distributed random variables, but E(X) = p x = 10 
and E(Y) = p v = 5. Then we know that the distribution of X is generally 
to the right, on the number line, of the distribution of Y, as in Fig. 10.11. 



FIG. 10.11 Distributions of X and V where E(X) = 10 and £(Y) = 5. 


Definition: The second moment of a random variable X is 
£(X*). 

E(X *) = Pi.X\ + p 2 X | + • • . + P n Xl 
if X is discrete. 


We have little use for the second moment directly. The idea of a “second 
moment” is used to define a very important concept, however. 
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is ihe probability that a child entering kindergarten will require special teaching, 
i.e., will have cither perceptual or emotional problems or both? 

3. a. Find 41. 

b. Find 151/13!. 

d Forlvhat^value ofn'is (a + *)! exactly twenty times larger than «!7 

4. An experimenter wishes 

least one subject can learn every posstble order, ng of pa.rs? 

5. a. Find 


0 - 
0 - 
=• FM (Is)- 


b. Find 


. u II haq 13 members. How many possible “starting 
6 - “he^vfptyers wTo star, the game-con, Id the coach form from his 
team of 13 players? 

7. Verify that + (I) + (2) + (?) + (I) e ^ ua * 

X (”) ■ 2 ”-) 

W „.mav a, en-item test be split into two test, of five items each? 

8. In how many ways mg a Klec t five items from ten to comprise one 

(Hint: Fmd the munber ° , ms £ r on , half is the same as nor S't'Cling them for 

EJZ&SS&** y~ —— * 2 > 

9. A pupil takes a 10-item ^““w"and oneach hem, what is the 

If he guesses randomly bet of4 or more correct? (Hint: Find the 

probability that he wi ittms correct and subtract the sum of these 

probabilities of either «.*»*» 

probabilities from ^ Qf (ota , lest scores X that would be obtained 

10. Graph the expected d pxam tnees who will guess at random the answers 

by 256 completely 'gnoran , hoice itcms 2 (H int : Regard the test items 
to each of four four-op f probability of success is 1/4. Multiply each 

a, four binomial trials ” _ 4 ^d p - 1/4 by 256 to obtain the 

term of the binomial dish b ^ ^ * pected to obtain each test score from 0 
number of examinees h t t £ aistribution 0 f chance scores for 90 

skewness of your resuits with tha, of his 

Curve 2. 
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U Itcan be proved that if " binomial trials result in ^successes, then the probability 
of this happening is maximal Up, the probability of success on any one trial, 
equals (When p is unknown and the outcome of n trials is nj successes, 
the ratio njn is termed the “maximum likelihood estimate” of the unknown p 
because this value of p, namely n,/n, makes the outcome more probable than 
does any other value of p.) Suppose 4 binomial trials yield 3 successes. Verify 
that this outcome has higher probability ifp = 3/4 than it p <= 1/2 or 4/5. 

12. Ten convicts volunteered for an experiment on the relationship between 
smoking and lung cancer. The convicts were matched into five matched pairs 
so that both pair mates are of the same age. Within each pair of convicts a 
coin was flipped to determine which convict would smoke two packs of cigarettes 
a day and which one would not smoke for the duration of the experiment. 
At the end of the 30-year experimental period the five smokers in each pair 
had lung cancer; none of the nonsmokers had lung cancer. (This experiment 
is pure fiction.) Suppose that at the outset of the experiment, five convicts 
either had lung cancer or were destined to develop it in the next 30 years whether 
they smoked or not. What is the probability that ir smoking is truly unrelated 
to lung cancer, the five initially cancerous convicts were totally by chance 
assigned to be the five experimental smokers'! 

13. In the general population, Stanford-Binet IQ’s are nearly normally distributed 
with a mean of 100 and a standard deviation of 16. By referring to Table B 
in Appendix A, determine the following probabilities to two decimal places: 

a. That a randomly sampled person will have an IQ between 80 and 120. 

b. That a randomly sampled person will have an IQ above 140. 

c. That three independently randomly sampled persons will all have IQ’s 
above 92. 

14 . The variable X takes on the values 0, 1, 2, 3, and 4 with probabilities 0, 2/5, 

1/5 1/5. and 1/5, respectively. What i, the value of E(X), the expected value 
of XI ' 


15. The variable X is binomially distributed with n = 4 and p = 
on the following values with the following probabilities: 

X Probability (A") 


1/2, i,e„ X takes 


0 TV 

> * 

2 * 

A ^ 

4 Vl 

b C °">F«*'hi<v,l utw j, h 

o. Determine the variance of AT, which is 

°* - ( £ W* ~ E{X)Y probability (AT,). 
Compare this value with npq - 4(l/ 2 )(l/ 2 ) - 


np « 4(1/2) = 
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16. Trials of an event that can happen in one of three ways are called trinomial. 
If the event A can occur in three possible ways {a„a 2 , a 3 ) with probabilities 
Pi* Pi . and p 3 (p l + pi + ~ 1), respectively, then it can be shown that the 

probability that n trials of A will produce r occurrences of a lt s occurrences 
of a,, and / occurrences of a s is given by the general formula for the trinomial 
distribution : 

, nl 

p(r,s ’ 0 “Z7i7 'irtripl- 

a. Show that if p s ■= 0 the formula for the trinomial distribution reduces to 
the formula for the binomial distribution. (Hint: If p 3 is zero, / can never 
be anything except zero and s must equal n — r.) 

b. The First Methodist Church is planning a Sunday evening pot-luck supper. 
Each of 20 families is asked to bring either a salad or a “hot dish” or a 
dessert. Assume that the families make their choice of a contribution to 
the supper independently and with the following probabilities: for salad, 
pi — .10; for “hot dish,” p 2 <= .60; for dessert, p 3 «= .30, Calculate the 
probability that— to the chagrin of the parents and to the delight of the 
children— each of the 20 families brings a dessert to the pot-luck supper. 
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THEORETICAL DISTRIBUTIONS 
FOR USE IN 

STATISTICAL INFERENCE 


11.1 

INTRODUCTION 

Pan of the “hoiking tools’* of inferential statistical methods are a group of 
theoretical distributions of some special variables. In this section we shall 
study four such distributions: the normal, which was the subject of Chapter 
6; the chi-square distribution; the /-distribution; and the /"-distribution. 


112 

NORMAL DISTRIBUTION 

In Chapter 6 we learned that a normal distribution is a particular type of 
mathematical curse. The graph of a normal distribution is symmetric 
about its mean. /<; it is unimodal; it has a lcurtosis of 3.0, etc. 

The normal distribution— in fact thcTt are many, one for every different 
set of values of n and o — is very important in statistical inference. Many 
inferential statistical techniques test on the assumption that the frequency 
distribution of scores on a variable in a population is adequately described 
as % normal distribution with a certain mean and standard deviation. H 
will be seen in this section that the other theoretical distributions buitd upon 
the normal distribution. 


\ 
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This primacy of the normal "°™ al 
three facts, one empirical, the o ^ rcpresenta tion of the frequency 

distribution happens to be a h t J f of di(Ierent variables. In Sec. 6.6 

distributions of scores on a > a . r 6C good representation of the fre- 

sve saw that the normal < bor „ in Great Britain during 

— ° n the ; 
a mathematical fact tha^averagh^gmdividua^scores a 

from a normal distnl-utton a / samplts are independent. 

two estimates produced for repea 


11.3 


HI-SQUABE DISTRIBUTIONS rc e5!en ,ially normally distributed 

Imagine a huge *£*£££> l 

with mean 0 and standard d lalion and the «am*« have , 

« -S£ — ■ — - 

normal distnbut on wl (1U) 

square ^Led from a norma, distribution is 

i,„ the square 

symbolized by Zi* , tel | s us that onl J[ ® D * un Umited number of times 

place and the subsc ^ conceive of repeating a ^ time a new X score is 
to produce Z ■ a value of *■* ^ ^ ^ ^ a frequency 

the process by standar dized, and s J hisf ue ncy polygon is smoothed 

jmlygon of the va^ues^of ZiSU^obta^ne^ ^ ve kccn^recorded^anc^if^the 

in this formula, see ur y 
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FIG. 11.1 Graph of the chi- 
square distribution with one 
degree of freedom. (The shaded 
__ area composes 30% of the area 
under the curve.) 


curve in Fig. 11,1 by x*. Th e curve takes its name from the Greek letter % 
used to denote it. (The mathematical curve for the chi-square distribution 
was derived by Karl Pearson in 1900.) 

The area under the curve for x\ is set equal to one unit so that ,s a 
probability distribution, e.g., the probability of obtaining a value of x* 
between 0.5 and 2.5 equals the area under the curve between 0.5 and 2.5. 
In Fig. 11.1 we sec that .30 or 30% of the area under the curve lies to the 
right of 1.07. Thus, we know that the probability of obtaining a value of 
s* = %\ that exceeds 1 .07 is .30. In other words, 30 % of the z scores randomly 
selected from a normal distribution will have squares that exceed 1.07. An 
equivalent statement of this fact is that the 70th percentile in the chi-square 
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distribution with one degree of freedom equals 1.07. We also write this as; 

•ioxf— 1-07, 

where x* denotes the chi-square distribution with one degree of freedom and 
.70 indicates the 70th percentile of that distribution. 

Now wc shall develop the chi-square distribution with two degrees of 
freedom, xl Suppose we go back to the original normal distribution of X. 
Instead of drawing out just one X score, draw two X scores at random and 
independently. Standardize each of these scores by subtracting p from it and 
dividing the difference by a. Call the first (by order of selection, not by 
size) standardized score z x and the second z s . Now square and sum the two 
z’s to form the quantity 

xl = ~) = * + (11-2) 

This process of determining a xl could be repeated thousands of times 
with new pairs of z scores. A frequency polygon of these xl scores could be 
constructed, smoothed, and reduced so that the area under the curve was 
one unit. The resulting curve would look like the graph of the mathematical 
curve x*. the chi-square distribution with two degrees of freedom. Fig. 1 1.2 
shows a graph of xl 

The Chi-Square Distribution with 
n Degrees of Freedom, xl 

A chi-square variable with n degrees of freedom, x*, is formed by adding 
together the squares of n independent z scores from a normal distribution: 

xl = z\ + z\+ ... + z*. (11.3) 

If a large number of these values are generated from separate sets 
of n z scores, their frequency polygon will have the same shape as the mathe- 
matical curve xl- F 'g ure ih3 iI,us,rates the S ra P hs of and Xtr 

The area under each curve in Fig. 11.3 is one unit. One-half the area 
under xlo l* es above tbe P°‘ nt 9-34. Hence, we know that the probability is 
.50 that the sum of the squares of ten z scores drawn at random from a 
normal distribution will exceed 9.34. Equivalently iM Xi*o = 9.34, the median 
of the chi-square distribution with ten degrees of freedom. 

There is a different chi-square distribution for each integer value of n 
(1 2 3, . . .). The properties of the curve x* depend upon the value of n. 
The following facts provide a partial description of the family of chi-square 
distributions: 

1. The mean of a chi-square distribution with n degrees of freedom 
is equal to n. For example, the average value of one would 
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FIG. IU Graphs of z? and x*.. 

expect to obtain by squaring and summing 12 independent, stand- 
ardized normal scores is 12. 

2. The mode of y\ «* at the point n — 2 for n = 2 or greater. 

3. The standard deviation of /.* is -Jin. 

4. The skewness of 7 * is Vs/n. Hence, every chi-square distribution 
is positively skewed, but the asymmetry becomes very slight lor 
large n. 

5. As n becomes large, y\ approaches more nearly a normal distri- 
bution with mean n and standard deviation \^2n. 

An important theorem concerning combinations of chi-square variables 
can be stated as follows: 

If zi, has a chi-square distribution with n v degrees of freedom 
and if x\ t has a chi-square distribution with n, degrees of 
freedom and is independent of zl,. then y\ -r Z* z has a 
chi-square distribution with n, 4- n z degrees of freedom. 

Thepth percentile in the chi-square distribution with n degrees of freedom 
is denoted by y y\. The percentiles of the chi-square distribution play 3 
prominent rote in inferential statistical techniques, particularly as applied 
to nominal data. Various percentiles in chi-square distributions for n = 1 
up to n = 30 appear in Table C of Appendix A to this (ext. The following 
ts an example of how Table C is read: Suppose one wishes to find the 50th 
percentile in the chi-square distribution with four degrees of freedom, i.e., 
First, the row labeled 4 in Table C is located. Second, the cofumn 
headed “50th percentile’’ is found; it is near the center of the table. At the 
intersection of the appropriate row and column, the number 3.36 is found. 
This is the value of the median of the chi-square distribution with four 
degrees of freedom. 
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Imagine that a ehi- S quare variate with f.ve degrees of freedom ** is formed 
f .!+ 4. -.2 Now suppose a second, independent chi-square variate 

with 10 degrees oT'freedom xlo * formed b ? sa ™P bn S valU “ * “a 

^eat^^ 


Aid/ * 

respective dc grees ' j known from mathematical statistics that 

°l V, C °h f nn'otr is an “d s ribu.L with S degrees of freedom for 

the distr.bution of fsjo 's an f for , he dcnominator . The F- 

"bTn w ith 5 and 10 degrees of freedom is a positively skewed distrt- 
distribution wrfbSandiu ^ ^ and & medJan less than L Because 

bution with mean ( )/( ’ ve valucs 0 f >10 may occur; hence the F M0 
of the squaring, on y S ri ht 0 f zer0 . The area under the Fj.,,, 

distribution has all of its f f is equa i t0 t h e probability of 

distribution between any ^ ^ ^ yalues Later in this text selected 
obtaining an F-ratio b s wiU be useful. Certain percentile points 

percentile points in - bulated long ag o for easy reference. We know, 
in the F-dismbutionswe .j[ the ^distribution with 5 and 10 

for example, that the f. eoua l to 2.52. That is, the probability is .90 

degrees or freedom, ...si.. 4 „ and 2 .52. equals 3.35, and 

of obtaining a value of f,.i« between 

..Aw equals 5.64. Mslrib ution to describe the distribution of 

There exists a different of degrees of freedom for the chi- 

each f-ratio with a umque^^ ior ^ d5nom i„ator. In general, if two 

square vanates in one with de grees of freedom and the 

itr^nf^rees of freedom rf,, are combined as 

„ i (ii.5) 

. p distribution with n. and u. degrees of freedom. Such an 

then F„,.„ has an F-distnO ^ ^ ^ num< , rator and degrees 

offr^him forthe denominator has the following properties: 

1. It is positively skewed, 

2. It is unimodal. 
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FIG. 11.4 Graphs of the distri- 
butions F 4M and f w . 


3. It has a median of I or less. 

4. It has a mean equal to nj{n t 


-2) for n, > 3 . 


of ffredor^appcaHn^ig 'n 4 ^ C ^ reeS 0 ^ Prcet * oman[ i , 3and25 degrees 


For example, the fifth percentile in the distribution F„, equals 


'S" 0 - 30 - 


the F-distributntefbr numerous rates of" ° f 'S' “ Pper p ''“" lile points in 
E correspond to values or.,; the rlws of Tabic e" T1 ’' H>, ' lm " s °f™tle 
">• Check the table to see if you can v«cr . 1 ° ? corres P ond to values of 
•* 3-37. i.e.. , n f„ „ _ 33, )on an v ' ,,r y ‘hat the 99th percentile in F„.„ 


IIS 

^DISTRIBUTIONS 


5^f^ra“ b ^“t L — — Wi‘h 

dmribuBon: and ^ 

The quantity t H m Eq (tl 71 ■ " ^ 

‘0 degrees From the , -distribution with 

os, tr the process of randomly drawing 
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one observation each fr om the unit normal distribution and from #f 0 and 
forming / l0 = was repeated an infinite number of times, the vaiues 

of / l0 would form a /-distribution with ten degrees of freedom. 

The /-distribution with JO degrees offreedom is described by a symmetric, 
unimoda! curve. The mean of the distribution is 0; the standard deviation 
is slightly greater than 1, approaching I as the degrees of freedom increase. 
The distribution is somewhat leptokurtic, i.e., it has kurtosis greater than 
3; hence, it is more peaked and has more area in the extreme “tails” of the 
distribution than the normal distribution. 

There is not just one /-distribution; as was true of the x 2 and E-dis- 
tributions, there exists a family of /-distributions. There is a different 
/-distribution for every distinct number of degrees of freedom for the chi- 
square variable in the denominator of Eq. (1J.7). If the denominator of 
Eq. (11.7) involves a chi-square variable with 5 degrees of freedom, then 


has a /-distribution with 5 degrees of freedom. In general, if z has a unit 
normal distribution and xl « a chi-square variable with n degrees of freedom 
and is independent of z , then 

1 , = -4= (11.8) 


has a /-distribution with n degrees of freedom. 

All of the /-distributions are described by symmetric, unimodal curves 
with a mean of 0. The variance of the /-distribution with n degrees of freedom 
is n/(n — 2). They are all slightly leptokurtic. As it becomes larger and 
larger, the distribution /„ begins to look more and more like a normal 
distribution. When n is infinitely large— a theoretical possibility that is 
empirically impossible — the /-distribution is the same as the normal dis- 
tribution. The /-distributions with degrees of freedom I, 5, and 25 appear 
along with the normal distribution in Fig. 11.5. 

In subsequent discussions of statistical inference, selected percentile 
points in a /-distribution will have to be found. The pth percentile in the /- 
distribution with 10 degrees offreedom will be denoted by v / 10 . The most 
often used percentiles in the /-distributions appear in Table D in Appendix A. 
There we read, for 1 example, that the 95th percentile in the /-distribution 
with 10 degrees of freedom, i.e., . M / M , > s equal to 1.812. 

Only the upper-percentile points in the /-distributions appear in Table D. 
Because of a simple relationship it is unnecessary to tabulate both upper- 
and lower-percentile points. The symmetry of all l- distributions implies that 



236 THEORETICAL DISTRIBUTIONS FOR USE IN STATISTICAL INFERENCE 


CHAP. II 



C ’’''"""I' . in ,he '■ dUlrib “ tio " «*h ” depees of 
Ireedoraequals the (1 — p)lh percem,!e m thesame distribution. Forefantple, 

.»s^i* =1.812, therefore K r t0 *= - 

■**'»> — 2.528, therefore =- . 


.vJi = 1.476, therefore 


.i« f s 


-1.812, 
-2.528, 
= -1.476. 


1 1.6 

RELATIONSHIPS AMONG THE 
NORMAL, t-, CHI-SQUARE 
AND F-DISTRIBUTIONS 

T^ e ^ • and F - -distributions are all ■ 

In each instance, sampling from a normal ,.° a ,he normal distribution, 
distribution. For exaLplf, 3 chi-souarT - li ° n UnderIies lhc 
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an independent chi-squar. 
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FIG. 1 1.6 The family of /-distributions and their relationship to the 
normal, t-, and * f distributions. 


divided by its degrees of freedom. Stated in slightly different form, 


" tRJn 


(ii-ii) 


We recognize, however, that Eq. (11.11) is an /'-variable with 1 and n degrees 
of freedom. Therefore, the square of a t-variable with n degrees of freedom 
is an F-variable with l and n degrees of freedom. 

It is somewhat more difficult to prove another interesting fact, which 
we shall simply state: Any F-distribution with n degrees of freedom for the 
numerator and infinite degrees of freedom for the denominator is the same as 
the distribution divided by the constant n: i.e.. 



n 


All of these facts are depicted in Fig. 11.6. Figure 11.6 is a cross- 
classification of the family of /'-distributions with respect to the degrees of 

* The proof depends on the fact that lim ~ = I. 
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freedom (1 through co) of numerator and denominator. Each cell in Fig. 

1 1.6 corresponds to an /--distribution. When a cell happens to coincide with 
a special case of either the normal, or distributions, the symbol for the 
/--distribution does not appear; this should not be taken to mean that no 
such /"-distribution exists for that cell , however, for an /"-distribution could be 
shown in every cell. 

The pth percentile in the x\ distribution is the same as thepth percentile 
in the «(/-„„,) distribution. However, if you square the />th percentile in 
the /-distribution with n degrees of freedom, you obtain the 2p — 1 percentile 
in the distribution F x For example, the 95th percentile of the ^-distri- 
bution is the 2(.9S) — 'l = .90 = 90th percentile of the/* = /^-distribution. 
(This is true because 5% of the cases exceed the 95th percentile in /„ and 5% 
lie below the 5th percentile. When the / values are squared both the top 
5% and the bottom 5% take on a positive sign; hence, 10% of the values in 
F l n exceed the square of 


PROBLEMS AND EXERCISES 


1. Complete the following table by finding the value of the designated percentile 
in the tables in Appendix A: 


Distribution 

Degrees 
of freedom 

Percentile 

Value of 
the percentile 

a- Normal 


54th 


b. / 

20 

97.5th 


c. Normal 


90th 


d. / 

120 

90th 


e ■ Chi-square 

6 

1st 


f. / 

4 and 60 

99th 


g. Chi-square 

15 

99.9th 



2 ' " ' h ' * «nd infinite degree, or 
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d.,.ribe.io„ wi“ h ' ° JIT'’' 0 " 1 ” f lh = C ® -PM percentile in the F- 
F-dotr, button »iih 4 >lrf i SS°of fcedom. ^ P ' rC “' ifc ' he 


K -T Wta ^ * »» - I'P'cs of freedom equal, 

i «*. the median of the /-distribution is 1. In whicl 
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direction is the /’•distribution with J2 and 12 degrees of freedom skewed? 
(Report your reasoning.) 

6. The mean of the chi-square distribution with n degrees of freedom is n. The 
one and only mode is at n -2 for n greater than 2. We know that the chi-square 
distribution is positively skewed. Is the median of the chi-square distribution 
with n degrees of freedom above, below, or equal ion ? 

7. For reasonably large n, the chi-square distribution with n degrees of freedom 
is nearly a normal distribution. The mean and standard deviation are n and 
'JIji, respectively, regardless of the size of n. For n = 20, find the 95th percentile 
in the chi-square distribution from Table C in Appendix A and compare it 
with the 95th percentile in a normal distribution with mean n = 20 and standard 
deviation ^2/r *= "^40. 

8. Prove that the variance of the r-distribution with n degrees of freedom is 
n}{n — 2). (Hint: is the same as the /-distribution with 1 and n degrees of 
freedom. Since /(/„) = 0, and a} = E[t — £(f)]\ the variance of /„ is £(/*).) 
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sample from a population. A sample is a part, or subset, of a population. 
The sampte is generally selected in a deliberate fashion from the population 
in order that the properties of the population can be studied. Theoretically, 
populations can be either infinitely large or finite in size. The truly infinite 
populations that come easily to mind are somewhat artificial or conceptual: 
the collection of all positive numbers, the collection of all possible lengths 
of a stick, the collection of tosses of two dice which could be made throughout 
eternity. Almost any interesting population of physical items — as opposed 
to conceptual possibilities — is finite in size: all persons in the Western 
Hemisphere, the refrigerators produced in Canada in the last decade, the 
school districts in the United States of America. A finite population may be 
extremely large, e.g., the proverbial “grains of sand on earth” or 1501, but 
if it is conceivable that the process of counting the elements of the population 
could be completed, then the papulation is finite. At times, but not often, 
in inferential statistics it is important to distinguish between finite and infinite 
populations. However, for the purposes of statistical inference it js generally 
not necessary to worry about the distinction between finite and infinite 
populations whenever the size of the population is more than 100 times 
greater than the sample taken from the population. If the ratio of population 
size to sample size is larger than 100, the techniques appropriate to making 
inferences to finite populations and those appropriate for infinite populations 
give essentially the same results. It is customary to use statistical techniques 
based on the assumption that infinite populations are being sampled whenever 
the population is reasonably large (containing several hundred or more 
elements) and the sample from the population does not constitute an 
appreciable proportion of the population. It is common to speak of a 
population as being “virtually infinite” when one means to say that it is 
huge but finite and that statistical techniques that assume infinite populations 
will be used on it. We shall not discuss inferential statistical techniques 
that have been developed for “finite populations,” i.e., for small populations 
or when a sample being studied is more than 1/100, say, of the population. 
An excellent treatment of these “finite techniques” can be found in William 
G. Cochran’s Sampling Techniques (1963). 

Measurements taken on populations of things can be described in the 
ways we have discussed in the preceding chapters. We can compute means, 
medians, variances, and percentiles on the data gathered from a population; 
we might calculate the correlation between height and weight for the popu- 
lation of high-school sophomores in the United States of America. The 
values of various descriptive measures computed for populations are called 
parameters. For samples, these same descriptive measures are called 
statistics. The parameter describes a population in the way a statistic describes 
& sample. It is customary to denote statistics by Roman letters and parameters 
by Greek letters. The symbol X stands for the sample mean, and the Greek 
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letter ft stands for the population mean. The sample variance is denoted 
by s i ‘, the population variance by a*. 

A statistic computed on a sample can be regarded as estimating a 
parameter in the population. An estimator is some function of the scores in 
a sample that produces a value, called the estimate', an estimate gives us 
some information about a parameter. For example, the sample mean X 
is an estimator of the mean or average score in the population. A random 
sample of 100 eight-year-olds might yield 104.65 for a sample mean on the 
California Test or Mental Maturity; this value, 104.65, would be an estimate 
of the mean test score in the population from which the pupils were sampled. 

12.2 

RANDOM SAMPLING 
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in a large group of random samples. This is not possible or feasible with 
many other types of sampling plan. For example, if one were to choose 
the first 50 men to walk down the street whose names began with the letter 
“T,” one would not have a random sample of the population of adult 
Americans. Furthermore, this sample of 50 is nonrepresentative of the 
population, and it is nonrepresentative in unknown ways and to unknown 
extents. However, if 50 persons were randomly sampled from the population 
of adult Americans it would be mathematically possible to answer such 
questions as “How likely is it that a randomly drawn sample of 50 adult 
Americans will have a mean height that is more than one inch above the 
mean height of all adult Americans?” 

The process of inferential statistical reasoning involves finding an 
estimate of a parameter from a sample and then determining how repre- 
sentative such a sample can be expected to be for the purpose of estimating 
the parameter. It is not surprising, then, that inferential statistics is based 
on assumptions of random sampling from populations. 

12.3 

THE CONCEPT OF A 

SAMPLING DISTRIBUTION 

The statistician assesses the representativeness to be expected from random 
samples by studying sampling distributions. The concept of a sampling 
distribution is basic to one entire branch of inferential statistics. A statistic 
or estimator calculated on a sample is said to possess a certain sampling 
distribution. You can imagine the process of choosing sample after sample 
of size n from a certain population and recording for each sample the value 
of some estimator, e.g., the sample mean X. . If this process of drawing 
a sample from the population were repeated thousands of times, it would 
be possible to construct a frequency distribution of the thousands of sample 
means that were obtained. The frequency distribution so constructed would 
look like the sampling distribution of the mean of samples of size n for the 
population being sampled. If a frequency polygon were drawn for the data 
to a scale so that the area under the curve was one unit, the curve would 
be almost identical to the sampling distribution of the sample mean. 

Suppose, for example, that a certain population has several thousand 
dements and that measurement of any one element will yield a score of 
0, 1, 2, . . . , 9 with egual probability. Hence, the random variable X takes 
on any one of the values 0, 1 , 2, ...» 9 with probability .10. A probability 
distribution for the population has the form shown in Fig. 12.1. 

One hundred random samples of size n — 2 were drawn from the above 
population. For each sample, the mean was calculated. The first sample 
contained the digits (3, 2), which yield a mean of (3 -f 2)/2 = 2.5. Ninety- 
nine other sample means were calculated, and all 100 means were graphed. 
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FIG, 12.1 Probability distribution for a population. 


The frequency polygon of the 100 sample means appears in Fig. 12.2. 

Figure 12.2 gives us some idea of what the actual sampling distribution 
of*. looks like for samples of size 2 from the population in Fig. 1 2.1. The 
graph in Fig. 12.2 is an empirical approximation to the sampling distribution 
of X, in this situation. 

Generally, the statistician does not have to rely on empirical procedures 
(such as were reported in Fig. 12.2) to determine the sampling distribution 
or a statistic. Advanced mathematical techniques can be used to answer 
such questions as “What is the distribution of the mean of samples of size n 
from a normal distribution?’ or “What is the distribution of the product 
moment coefficient of correlation of X and Y in samples of size n from a 
population in which the correlation of X and Y is zero?” Because of the 
difficult mathematics underlying the derivations of most of the sampling 
distributions we shall use. we shall simply report the results without proof. 

One of the principal theorems of inferential statistics concerns the 
sampling distribution of the sample mean X , . This theorem is called the 
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central limit theorem. Suppose that samples are being drawn from an 
infinitely large population. The mean of this population is denoted by jx 
and the variance by a*. Random samples of size n will be drawn from the 
population. What will the sampling distribution of the sample mean X, look 
like? If n is “sufficiently large” (and it is not possible to be more specific 
about the size of n ), the sample mean will be very nearly normally distributed. 
Furthermore, the mean of all of the sample means will equal fi, the population 
mean; and the variance of the sample means will equal c 2 /n, where a® is 
the population variance. 

The following example will be used to illustrate the central limit theorem. 
Suppose that the population in Fig. 12.3 has a mean p equal to 15 and a 
variance a 1 equal to 100. Random samples of size 100 will be drawn from 
the population in Fig. 12.3. The central limit theorem tells us that the dis- 
tribution of the means X. of these samples will be nearly normal with a mean 
of 15 and a variance of o\/n = 100/100 = I. The sampling distribution of 
X. in this instance is depicted in Fig. 12.4. 

It may seem incredible to you at first that, regardless of the shape of 
the population being sampled, the means or “sufficiently large” samples will 
have a normal distribution. Such is the case, however. This is one of three 
reasons why the normal distribution is so important in statistics. Just how 
large n must be before the sampling distribution of X. is nearly normal 
depends on the shape of the population. Samples of size 100 are probably 
large enough to yield nearly normal sampling distribtuions of X. for most 
populations one might meet in practice. 



FIG. 12.3 Distribution of X for a population. 


FIG. 12.4 Distribution of the 
means of random samples of 
size 100 from the population in 
Fig. 12.3. 
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A proof of the fact that the sampling distribution of JP has a mean of 
p and variance of o*/ n » where p and cr* are the mean and variance of the 
population sampled and n is the sample size, is not too difficult to develop. 
X is the variable being measured on the population; its mean is p and its 
variance is a-. A random sample of size n has a first element X x , a second 
element Aj, .... and an nth element X„. The ordering of the subscript is 
not indicative of the size of the score in the sample; A', is merely th t first 
score chosen in each sample. Therefore, the collection of all possible Ays, 
i.e., all first scores chosen in all possible random samples from the population, 
forms a population with mean p and variance o 2 . In other words, X u 
Xt,...,X n are each random variables with a mean of p and a variance of a 2 . 

The sample mean equals (X l + X t + ... + X n )!„. The mean of the 
sampling distribution of X. equals the expected value ofJF 

= £f(X x + X t + . . . + X„)ln] 

= ~ E(X 1 + X* 4- . . . + X„) 


“ ^ [£(Xl) + £ <**> + ■ • • + £(*,)]. (12.1) 
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Suppose we try to find the variance of the means of samples of size 
n = 2 from a population. Let the population variance be a 2 . For each 
sample, X. = (A\ -V 2 )/2 is calculated. X x and X 2 are arbitrary designations 

for the first and second observations randomly drawn and are not related to 
the size of the scores. Consequently, over all random samples, X x has 
variance a 2 and so does X.. Because the samples are randomly drawn, there 
is no relationship between the sizes of the first and second observations in 
any sample. If we were to construct a scatter diagram for graphing the points 
(X lf X t ) from sample to sample, after hundreds of samples the scatter 
diagram would show no correlation between X x and X 2 . This is depicted 
in Fig. 12.5. 

Clearly, X x and X t are uncorrelated. Thus, the correlation coefficient 
and covariance between the first and second observations in a sample over 
infinitely many random samples from a population is zero. 

Now the variance over random samples of X. — (X x + X 2 )/2 is denoted 
as follows: 

A. = <..„>/«■ ( 12 . 2 ) 

The ct 2 ’s are simply the expected, long-run average values ofs 2 ; — 
of. Equation (12.2) shows that the variance of X, is the same as the variance 
of 1 /2 times the sum of X t and X,. We saw in Sec. 5.8 that the effect on 
the variance of a variable of multiplying the variable by a constant was to 
multiply the variance by the square of the constant. Therefore, 

— (£) 2ff ?*,+*,>• (12.3) 

In Sec. 7.9 we learned that if two variables are uncorrelated then the 
variance of the sum of the two variables is the sum of their variances. Above 
we argued that X x and X z are uncorrelated. Hence, 

® 8 »L+„ ■= 1(<4 + <4 •*- = H4, + »:.)■ 02.4) 



FIG. 12.5 Scatter diagram of the g 
relationship between the first (A'i) 10 

and second (A*) observations in 
random samples from a population. 


First observation in sample, A'i 
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The variance of X, over repeated random samples is <r}, and so is the 
variance of X,. Therefore, we can write Eq. (12.'*) as follows: 

ttX<4 + «y = GW-* = 0.(2. (12.5) 

Equation (12.5) expresses the conclusion of the argument: the variance 
of the mean of samples of size 2 from a population with variance a is equal 
to c*/2. In this instance, n = 2 and <%_ = a-(l. This Is no coincidence. It 
is true in general that for random samples of size n, a% = «*/«• L* 1 s 
explore this general statement further. 

If random samples of size n are taken from a population with variance 
a* t then the variance of the mean, X = (X t + X t + . . . ~f X^Jn, over 
samples is given by 

= ^^irr. . .+*.)/«• (12.6) 

The right-hand side of Eq. (12.6) shows o* to be the variance of (1/n) 
times the sum of the n uncorrelated variables Afj, X t , . . . , X u . Therefore, 


Each variable X t (i =* 1 , 2 n) has a variance of o* and is uncorTclated 

with the other n — 1 variables. Therefore, the variance of the sum of the 
n uncorretated variables is the sum of the variances of the variables, because 
each of the n(n — l)f2 covariances is 0. Thus 


. ..... - 0V„ + + 4,). (0.7) 

Because each variable has the same variance a 1 , Eq. (1 2.7) can be written 
as 

(J)(o* o* -i- .. . + o*) = (0 ( m 3 s ) = ~ . (12,8) 

A fundamental relationship is expressed in Eq. (12.8). The coricmce of the 
meam of random samples of size n from a population with variance o* is equal 
to a'ln. 

The expression a* In has traditionally been called she torionce error of 
the mean. The positive square root of Eq. (12.8) is another important 
expression known as the standard error of the mean. 


al-Jn. 


02.9) 
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means of random samples of size 100 from that population was 1. 
consistent with Eq. (12.9): 




This is 


The estimation of a population correlation coefficient provides another 
illustration of the concept of a sampling distribution. Probably the variables 
“verbal intelligence” X and “reaction time” Y are virtually uncorrelated 
in the population of all twelve-year-olds in the U.S.A. Imagine for the 
moment that someone administered the Wechsler Intelligence Scale for 
Children to measure X and a reaction time test to measure Y to all children 
age 12 in the country. With these measures, a scatter diagram of the X 
and Y scores could be constructed and, also, the correlation between Xand 
Y could be calculated. Suppose, further, that the normal bivariate surface 
(see Sec. 6.6) proved to be an adequate description of the scatter diagram 
for the X and Y scores. Also, suppose that the value of the product-moment 
correlation coefficient, which we will denote by the Greek letter p, turned 
out to be zero, i.e., p ty = 0. (Since the correlation coefficient describes the 
population in which we are interested, it is a parameter instead of a sample 
statistic. We have followed the widely accepted custom of denoting all 
parameters by Greek letters; statistics are denoted by Roman letters. Hence, 
p is the correlation coefficient in a population, and r is a correlation coefficient 
in a sample from the population.) A sample of 80 children could be drawn 
randomly from the population and their X and Y scores observed. For this 
sample, the value of r xv , the sample correlation of “verbal intelligence” and 
“reaction time,” could be computed. The sample of 80 could then be 
returned to the population. A second sample of 80 could then be drawn 
at random, r zv computed, and the sample returned to the population. This 
process could be repeated indefinitely; a large number of values of r mv for 
samples of size 80 from a bivariate normal population in which p xll equals 
0 could thus be accumulated. What would the frequency distribution of 
the large collection of sample correlation coefficients look like? The statis- 
tician can answer this question without going through the actual process of 
drawing thousands of samples and computing r each time. He can show 
mathematically that the distribution of these values of r for random samples 
of size 80 from a bivariate normal population in which p = 0 is nearly a 
normal distribution with mean 0 and variance 1 /(« — 1) = 1/79. The theo- 
retical sampfing distribution of r for random samples of size 80 from a 
population with p = 0 appears in Fig. 12.6. 

The standard deviation of the sampling distribution of r is called the 
standard error of the correlation coefficient. In this particular case only, it 
happens to be equal to 0.11. The standard error of r is denoted by o r . 

Approximately 68% of the samples will yield values of r between —.11 
and -f.U. About 95% of the samples will yield values of r between —.22 
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the sample mode as an estimate of ftl It is certainly possible to do this; 
we shall see, however, that by the criteria used in assessing the properties 
of an estimator, X, turns out to be a better estimator of /< than either the 
sample median or the sample mode. 

In this section we shall also be concerned with the properties of estimators 
of a population median, variance (tr ! ), standard deviation (a), and correlation 
coeflicient (p). What are the different ways in which these parameters can 
be estimated ? Is one estimator to be preferred over all others for estimating 
a certain parameter, and why? We shall look closely at three properties of 
estimators. 


Unbiasedness 

An estimator is said to be unbiased for estimating a parameter if the mean of 
the sampling distribution of the estimator equals the value of the parameter 
being estimated. 

Regardless of the nature of the population being sampled, the sample 
mean X. is an unbiased estimator of the population mean p. Notice in 
Figs. 12.3 and 12.4 that the value of the population mean p is 15 and that 
the mean of the sampling distribution of X. is also 15. This example 
illustrates the unbiasedness of X . as an estimator of p. If samples are drawn 
randomly from a normal distribution (or any other symmetric distribution), 
then the sample median is also an unbiased estimator of the population mean 
p. In other words, the average of the medians of an infinite number of 
random samples from a normal distribution equals p, the mean of the 
normal distribution (which is, of course, also its median and its mode). 

There are many examples of biased estimators. Suppose we wish to 
estimate p, the correlation between two variables that have a bivariate 
normal distribution in the population. Imagine that for a particular 
population p = .75. The mean of the sampling distribution of the sample 
correlation coefficient r will be less than .75 for any finite sample size. Thus, 
r is in general a biased estimator of p. If you have already looked back at 
Fig. 12.6, you might doubt this statement. In Fig. 12.6, we saw that the 
mean of the sampling distribution of r was 0 for samples of size 80 from a 
population in which p = 0. Consequently, r was then an unbiased estimator 
of p. It so happens that r estimates p unbiasedly only when p = 0. If p is 
any other value from —1.00 to +1.00, there will bea bias in r as an estimator 
of p. (Olkin and Pratt, 1958, have derived the unbiased estimator of p. 

Its calculation on a sample is rather complex. Olkin and Pratt provided 
tables for finding an unbiased estimate of the population correlation 
coefficient.) 

Though we are formally dealing with the property of unbiasedness of 
estimators here for the first time in this text, this property influenced the 
methods used to describe variation in Chapter 5. In Sec. 5.5 we chose to 
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measure the variation in a sample by the quantity j* = (X t — JP.)*/(n — I). 
It might have been more natural to measure variability by simply taking the 
average of the squared deviations around the sample mean, but instead it 
was decided to place (n — I) and not n in the denominator of s*. Now we 
are in a position to elaborate on the motivation of this choice. The quantity 
s l is an unbiased estimator of the population i ariance a 1 , whereas £ (X, - Xy-Jn 
is negatively biased as an estimator of a i . That is, ' 




approaching equality only as n ■— as. 
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HG. 12.7 Sampling distributions of s] and £(Xt — JP.) 2 /6 for random 
samples of size 6 from a normal distribution with variance o 1 = 100. 


SO the sample standard deviation is a biased estimator of the population 
standard deviation. The amount of bias depends on the shape of the 
population being sampled. If the population is normal, the mean of the 
sampling distribution of s is slightly less than tr. Specifically, 

E(s) = 1*. - a ' (12.10) 

If n is fairly large, the bias in s is quite small. Nonetheless, s remains a 
biased (but consistent) estimator of a. As the expression in 

parentheses in Eq. (12.10) approaches 1 and the bias disappears. 

table 12.1 biasedness or unbiasedness of various estimators of 

PARAMETERS OF VARIOUS POPULATIONS 


Status 

Nature of of the 

Parameter population Estimator estimator 


p 

Any population 

X 

Unbiased 

p 

Symmetric 

Median 

Unbiased 

p 

Symmetric and 
unimodal 

Mode 

Unbiased 


Skewed 

Median 

Biased 


Skewed 

Mode 

Biased 


Any population 


Unbiased 

a 

Normal 

j. 

Biased negatively* 

P** 

Bivariate-normal 

r " 

Biased negatively 


* For interesting discussion of unbiased estimation of the standard deviation, see 
Cureton (19686), Jarrett (1968), Cureton (1968c), and Bolch (1968) in that order. 


Table 12.1 presents, some parameters, their estimators, and statements 
that the estimator is biased or unbiased. As you study Table 12.1, remember 
that several different sample statistics may be used to estimate the same 
parameter and that whether an estimator is biased or unbiased depends in 
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pari upon the shape of the distribution of measures in the population from 
which samples are drawn. 


Consistency 


A second property of estimators is their consistency. A consistent estimator, 
even though it may be biased, tends to get closer and closer to the value of 
the parameter it estimates as the sample size becomes larger and larger. 
Some estimators that are biased are consistent. For example, the sample 
standard deviation is a biased but consistent estimator of a. By taking a 
Urge sample, a will be close to o in value; the larger the sample becomes, 
£L C . °* r J T *" s “ n algebraically if you let n approach 

moTi« h P 1 ' ( '■ ° ' " a " lhc CMdill °" or consistency 

n ?L‘ Q TTr mM0 ' ° f a P ,ram ' w « calculated on a sampii 
in such a way that if the same calculation was performed on the entire 

SrjItM 1 Ib - Val “ of > hc PCtatneter. The sample mean is a 
insistent estimator of /i since, if the sample were made as large as the 

diZmLnJ This requirement of consistency makes good sense It is 

22* &rr ,ta .*« Jiwi 2 

are quite co'S P «t.ma.ors tha, might be presented 


Relative Efficiency 

ciency" refem^L'JrecS w” rwhSL' 0 '' 5 '' 1 " ” " ,cir r -0 !cl " lc f- “HR- 
it refers to the variability of the estimate f "'l t " ,,al0 '' s ' ,, " a,,!sa P ar antctcr; 
previous examples we have measured ihk • ” mp e ,0 sam pl e - In a few 
the variance or standard deviation of fc“ b ‘ " y , (or effi “'"Cy) by taking 

=£ Mr. xrir£=» 5 £ 

^'"'““"'"“'■ofanysufeSj^SM'ofttsntMtimponantpropenies. 
of the caustic. otstbevananceorthe sampling distribution 

panicular normal d, si, ibuS m oneway a o” f P ° pUla,ion "lean of a 
X ofa samplcof^^ Ho.e.cr The * m , T'" E '‘ find ,hc ">«“ 

”’° ,h '“° w*- 
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or the sample median, varies less from sample to sample? Which has a 
smaller variance error? 

If the variance a 2 of the normal population being sampled is 50 and 
sample size n is 10, then the variance of X. over repeated random samples 
is a 2 jn = 50/10 = 5. What about the variance error of the sample median, 

? |f thousands and thousands of random samples, each of size n, are 
drawn from a normal population with mean p and variance o', and the 
median AM is calculated for each sample, the frequency distribution of these 
sample medians will be normal with mean p and variance (1.57)o’/n. Hence, 
the variance error of the sample median is (1.57 )o ! /tt. Figure 12.8 depicts 
the sampling distributions of X. and AM for samples of size 10 from a normal 
distribution with variance 50. 


Sampling distribution 
of X 

(<rj? = 2-24) 



Sampling distribution 
of Md 
= 2.80) 


FIG 12 8 Sampling distributions of the sample mean X. and the 
sample median A/rffor random samples of size 1 0 from a normat population 
with mean /* = 20 and variance <r‘ = 50. 


The variance error of the sample median in Fig. 12.8 is equal to 
(1 STIaVn = (I 571(5) = 7.85. This figure reveals that the sample median 
will vary more than the sample mean over repeated samples Note that wht e 
only 16% of the sample means will be larger than 25, about 26 / 0 of the sample 
only to /„ o K X isa more efficient estimator of p than AM. 

Seeking 5 greater descriptive precision . the statistician defines the effiCency of 
X "Jive to AM as the ratio of their variance errors. In this instance, 

a 2 In 1 __ g _ 63.7 V 

relative efficiency = 57J(r */„ ~ 1,57 

■ , 1 ,,. f„r normally distributed measures the median is less than two- 

meaning that for n y ' re g a rdless of the magnitude of it. 

thirds as efficient a coefficient of relative efficiency is that if the 

One interpretatio obscrvations fa used to estimate the same 

decree 1 of precision of estimation could be attained by drawing a sample of 
64 observations and computing ! * • criteria of unbiasedness an d 

Statistic, ans ha* e ^ 0 ^ n a -best” estimator of a parameter, 
efficiency ' vhen makl " 6 meti ian, and mode might all be worthy 

iXt S-of'ol „ in a normal popu.ation. The firs, 
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question a statistician would probably ask is “Which ones are unbiased 
estimators?*’ All three qualify on this criterion. The next question would 
probably be “Which one is the most efficient?", i.e., “Which one has the 
smallest variance error?" The sample mode is least efficient, and wc saw 
that the sample median is less efficient than the sample mean. The sample 
mean wins. In fact, the sample mean wins relative to any competition. 
The primary reason that X, is used almost exclusively to estimate the 
population mean fi of any population is that it has a smaller variance error 
than any other unbiased estimator of In this sentence you can sec that 
both the properties of unbiasedness and efficiency are important. 

12.5 

INTERVAL ESTIMATION 


I?' ™ ha . ve disc “ sed ,bus far chapter belong to one brooch 

or the theory of estimation of parameters. That branch is called point 

value a 

sc“oTtachet S in the^T 'T "if p0pl " a ' ion ° r a " 2t^ 

and thus provides a point «„W,c of ^ theZpuE mean.' "" 

ofiSESr ' S ‘ ,Pka,lj b “ ild! ‘he concept 
a highly useful inferential MatiJLlli'^n'l^r'w]"^ 1 e”’ 1 ™ 1 ! 0 " h 

repeatedly in the remainder or this text. ’ We sbal1 '“counter it 

and ^vEofS^ammetoS'Zulht^ V' 8 "”" 1 °" ,he numb " linc ' 
For example, the result of drawing a samoh- r' SOnKwh "' e °" th at interval, 
estimate p might be the interval (25 91 « 6sT a ,?° pUla ' I ° n °'der 10 
between its bounds-lower bound 25 91 .mi J"* P robabI y contains 

-f «. Instead of calculating " sLle 'Z "'^' ’T" d 38 ' 65 -' ba va,ae 

we have now round a whole set of .a ? n cstlmate of a parameter, 

those points is probably the value of the""' P °'" 1S ' inl ' rval . and one of 
anses to make interval estimation a difrcuhT?'"^ Th,: con ’P li calion that 
in which one determines exactly how oroh k, •" ■ com P r 'hend 'S the way 
on the interval. The remainder ofZsL, “ “ ‘ hal ,h ' Parameter lies 
of constructing an interval estimated? :1 '° n ' ! ' :on “rned with the mechanics 
has a known probability of i„cl ul ll« ? h P arai ”'*c r m such a way that it 
l,m ' ,s For y '"Coding the value of the parameter between its 
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Iimtt theorem tells us about the sampling 
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95% 

PIG ,2.9 The sampling distribution of X. for random samples of size 
n from a population. 


distribution of JT . We have seen that if random samples of size n i are drawn 
from a population with mean ft and variance o ! , the sampling distribution 
or Jf will have mean ft and variance iA/n, and will be nearly normal if n is 
sufficiently large. The sampling distribulion or.?, is shown in Fig. 12.9. 

In Fk 12 9 the standard devia.ion of the sampling distribution of X. , 
.he standard error of the mean, is o/v/n. Since the distribution is normaf 
too, _ f the observations lie within one standard deviation of//, i.e., 68/' 
of 1 sample means .hat would be obtained in repeated random sampling 
would lie on the interval ft - Approximately ^95% 

of the means he °n «^ c "twif sta^ar^deviations^^ 

of the area “ nd “ ^ ute j Se” nit normal distribution, we can 

the mean. By re E standard deviations we must go out from // in 

^‘^•."eltr SlSan interval that includes 80%, 90%, 99%. or any 
each direction to establish curve . For sample, we can see from 

other percent of under the norma l curve lies 

Table B in Appendix AJ ia ‘lf s °‘ Iowa ; SCOreo (--I.64. In other words, 
above a a score of 1.64, and % 1.64 standard deviations 

90% of the area under the normal curve lies ± , ^ F(jr ^ ^ 

on either side of 'the mean ^ ^ obtained in repeated random 

this means that % ^ __ (l . 64a /J„) and ft + (1.64 rr/sA). 

sampling 'vould he on < probability is .90 that a randomly drawn 

than // + (1.64 ct/a/m). 
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samples which might be drawn from the population, adding 1.64ir/vG to 
X and subtracting l.64u/^ from * will establish an interval that does nor 
contain ,i between its lower and upper limits. Such an interval is depicted 

F ’fuppose" we now consider the entire collection of means of random 
f „ r r om the population in question. This is an infinitely 
large collection 0 andean™ single 1 element of it is denoted by*. . We can think 
of constructing an interval around each of these sample means by adding 
i fZ from each mean. Now we have an infinite 

and subtracting ^ Ninety percent of these intervals contain ft between 

collection of intern ■ IP ml do nDt what ; s the probability 

their lower and U PP jmervaUrom this infinite collection of intervals 
that a randomly j** Of course this probability 
Is M^The probability must be .10, then, that an interval randomly selected 

in this manner wHl nor “^"^^'"Je'havebeen discussing is concise, 
The mathemat since X. is distributed normally with 

“"a' " b ‘ e :°Z al 7 h variable (* - ,)/(o/^) has a normal dis- 
tritortio^wftMnean 0 and variance 1. We know, then, .ha, 


probability | 


-1.64 <£-=#< 1-64) = .90. 
o/V'i 


( 12 . 11 ) 


Stated in words, the probability that the distance of* from P 1" “ nits 
Sta , /Ii. greater than — 1.64 and less than 1.64 is .90. Multiplying 
measured by o/V 8 theses by a/Jn, subtracting X. throughout 

the inequality inside • t*i P b /_l_-which changes the direction 

,«»»» j)] <'<['■» ('“s)]l ■ 



FIG. 12.11 Illustration of how an interval established around X. may 
not capture p within its limits. 
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Thus, the probability is .90 that ft is greater than X. — (l.64o/Vn) and less 
than X. + (1.64 o}\!n). 

When an interval estimate of a parameter is constructed so that it has 
a certain known probability of including the value of the parameter between 
its limits, the interval is called a confidence intend. The probability that 
the confidence interval “captures" the parameter between its limits should be 
used in identifying the interval. In the example developed above, a “.90 
confidence interval,” or “90% confidence interval” was constructed. The 
confidence coefficient is the probability that a randomly selected interval 
from the collection of all possible confidence intervals as defined in the above 
way (Le., X. ± zolsfn) will capture the parameter. In the above example, 
the confidence coefficient we chose to use was equal to .90. Another way 
of saying this is as follows: “A confidence interval with confidence coefficient 
.90 was constructed." Finally, we speak of constructing a confidence interval 
around a sample statistic and nn a parameter, because for a given population 

Sr‘ m '“ r assum " j“ SI 5>lu=. For example, X 4- (!.«,/.& is 
the .90 confidence interval around X and on /i 

abouuhe rationale of applyi^^Th™ tawto5^“ n * ' CarnCd 

resoumes!^?a"X K |oTm d “^,"'>: aS “ h ' «"> 1><* limited 
for Children (WIScf, ofrtie 35 OOOfif.t md* V ? C !“ ler ln,c,li S cncc Scale 
is an individual inteilta^ pUp,l5inl,ill!ale - ^WISC 

verbal inleltigence and! total IQ ibatfs'a ° f P' rl ° rm ance and 

administered by trained examined Th 3 °^ ,nat,on of both ; it must be 
cover 900 test administrations, but ™ more ’°' th " rcstareh wi " 

his sSe' are^^hcrerogencous^ tbe 1° ’ f ,hat «“ Mh-grade pupils in 
has some reason to believe that their™. "° ,m the WISC . hut he 

of the norm group. Hence he is w :tr 8 ® C SC ° re might deviate from that 
wise rourl IQ'S ii his stare is 225— th"^ 10 b " ieVe llM «■« variance of 
group.. ““ 225 ->hc same as the variance in the norm 


SZ'S r ta ~ and /. is not knows 

““ n jL mpier ° 5 bSirSh 0 T et 

It IS due lo w S f- me!hods; thi * solution was not nrr,^, lhe dawn of moden 

return* who *rote un Until *** in thi * «»«««* 

w.ji 
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Our researcher is in the position of taking a random sample of 900 
WISC Total IQ scores from a population of 35, 000 scores in which the variance 
is 225. He will calculate X, as an estimate of the unknown p. He wishes 
to establish a confidence interval around X., and he would like the confidence 
coefficient for this interval to be .99. With samples as large as 900, one can 
be confident in this situation that in repeated random samples X. is very 
n early no rmally distributed around p with a standard error of <r/Vn = 
V225/900 = 0.5. From Tabic B in Appendix A, it can be determined that 
99% of the area under the unit normal curve lies within 2.58 standard- 
deviation units of the mean. The sampling distribution of X, for samples 
of 900 from a population with o s = 225 appears in Fig. 12.12, where distance 
along the baseline is in terms of a Jw ~ 0.5. Thus, 99% of the area under this 
curve lies within /< ± 1-29 because 2.58(0.5) = 1.29. 

Confusion often arises in the interpretation of a confidence interval. 
For example, some persons believe mistakenly that if the 95% confidence 
interval on p around a sample mean of 46.25 extends from 36.25 to 56.25, 
then 95% of the sample means would be expected to fall between 36.25 and 
56.25. This is an incorrect interpretation of a confidence interval. If 46.25 
happens to lie far (perhaps 3o,.) above (or below) p— and there is no way 
of knowing this from the sample — then less than half of the subsequent 
sample means would fall between 36.25 and 56.25. 

It is correct to state that “the probability is .95 that X . ± (1.96a/Vn) 
will span p ,** or to write the same in symbols: 

prob{|> - </> < [*. + ( ! - 96 7=)]} - - 95 ' ( t212 > 

It is understood that the probability statement refers to the sample space of 
all intervals that could be formed by computing one interval for each sample. 



FIG. 12.12 Sampling distribution of X, for random samples of size 900 
from a population with unknown mean ft and known variance <j* «■* 225. 
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The mathematical statement in Eq. (12.12) makes petfect sense because it 
involves a variable X. about which prohabthty statements can be made 
However, after a single sample has been drawn and a smgk value like 46.25 
attached to A>, it is incorrect to say that 46.25 ± ).96«r|s/n has probaWtty 
.95 of spanning ft. It is not legitimate to write 

prob{[46.25 - , < [46.25 - (>«^)]] = «• 


The above statement of a probability makes no sense because there is no 
quantity that varies from sample to sample. 

The chances are 99 in 100 that the mean of a random sample will Ue 
less than 2.58 ojyfn units from ft. In this case, 2.5^aJ\frt ~ 2.58(0.5) = 1-29. 
The researcher knows this is true even though we do not know the value of 
ft. Consequently, if he conceives of adding and subtracting 1.29 from all 
sample means he would obtain in a series of random samples, he would 
have a probability of .99 of capturing the true value of ft between the limits 
of the confidence intervals he so constructed. Suppose now that he draws 
his single sample and obtains a mean of 103.72. The .99 confidence interval 
on ft around X. is calculated as follows: 


X. ± 2.58 = 103.72 ± 1.29 -- (103.72 - 1.29, 103.72 -f 1.29) 

\'n 

= (102.43, 105.01). 

It is sometimes said that “the probability that (102.43, 105.01) captures 
ft is .99.” This is a confusion and is not true. The value of neither does or 
does not lie between 102.43 and 105.01. Since the value of ft is unknown, 
we do not know whether it does or does not lie in this interval; we know, 
however, that only these two possibilities are logical. The interval (102.43, 
105.01) is not a sample space (see Sec. 10.2), and it does not make sense 
to speak of probabilities in connection with it. Probabilities and the con- 
fidence coefficient apply to the repeated process of constructing confidence 
intervals. R. von Mises, one of the two or three persons most 
responsible for modern, statistical notions of probability, has said (1939, 
pp. 11-14): “Our probability theory has no relation to questions such as: 
‘Is there a probability of Germany being again involved in a war with 
LibenaT ... we may say that in order to apply the theory of probability 
we must have a practically unlimited sequence of similar observations.'’ If 
the researcher in the above example were to repeat his actions of drawing 
samples and constructing confidence intervals on ft indefinitely, then 99% 
of the intervals he would produce would contain the value of ft. The sample 
space to which the probability statement applies is the collection of all 



SEC. I2.S 


INTERVAL ESTIMATION 263 


possible confidence intervals. But any real-life researcher draws only a 
few samples, typically just one. Whether or not the confidence interval he 
constructs captures the parameter he is interested in cannot be known by him. 
It may and then it may not. All he knows is that he has performed an action 
that would result, if he were to repeat it thousands of times, in the parameter 
being captured between the limits of his confidence interval 99% (or 95%, 
or 90%, or any other percent he chooses) of the time. If his confidence 
coefficient is large enough, he cannot help but believe that this particular 
interval he has just calculated has captured the value of the parameter. 
(This is one good reason for keeping the confidence coefficient large — .90, 
.95, .99, or even larger — as is typically done.) However, if he is rational, 
he will acknowledge the slight probability (.01, or .05, or .10, or any other 
value he chooses) that his belief is mistaken. 

Summary of the construction of a confidence interval on /i around X t for 
large samples when o z Is known. Samples of size n, sufficiently large to 
insure the near normality of the sampling distribution of X, , are drawn from 
a population with unknown mean fi and known variance a 2 . A confidence 
interval on p is to be constructed around X, . A confidence coefficient of 
1 — a is chosen. For example, if a confidence coefficient of .95 is desired, 
a = .05. (This rather backward way of denoting the confidence coefficient, 
namely as 1 — a, has a purpose that will become evident after more work 
in statistical inference. The notation is due partly to historical precedent, 
however. In Sec. 13.4, the origin of the a notation will be presented.) 

From Table B in Appendix A, the value of the z score above which 
100(a/2)% of the area under the unit normal curve lies is found. Denote 
this z score by z,_ {ll/I) . The 1 — a confidence interval around X, is given by 
the following formula: 

[x. - -2= . x. + ~=). (12.13) 

An expression equivalent to Eq. (12.13) is the following: 

X. ± Zj_(«/2) ~7= ■ (12.14) 

Equations (12.13) and (12.14) will be illustrated on the following data: 

\ m n ~ 400, a 2 = 36, and the confidence coefficient is arbitrarily chosen 
to equal .95; i.e., 1 — a = .95, so a = .05. From Table B in 
Appendix A we see that 100(a/2)% = 2.5% of the area under the 
unit normal curve lies above a z score of 1 .96. Hence, z . , 7S — 1.96. 

2. The mean of the sample of 400 observations equals 51.04. 
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The standard deviation of the sampling distribution of r for random 
sample" or sfre n from a bivariate norma, popu.ahon w,.h eorrelahon 
coefficient p is 


o,= 

V n - 1 


e a fn +i the sampling distribution of r becomes in- 
As p increases from 0 to _ . from 0 t0 _ !f the sampling 

creasingly negatively skewed. As p , . 

distribution becomes increasingly P®* 1 * * ible slate 0 f affairs that would 
Perhaps you can »PP'«“ * distribution information about r 

exist ir sve tried to use the above P S will sur ely be im- 

to construct confidence intervals A ■ £ that the English statistician 

pressed by the insightful solution to P Fisher determined that 

Sir Ronald A. Fisher (1890-1962) prod®* Se^pb^iUon eoeffieient 
a particular mathematical trans ori " distribution that would have the 
would yield a quantity OT,h * “ £ 0 S f This transformation is called 
same variance regardless of for £ at ion 0 f any r is denoted by Z f ; 

Fiber’s Z-transformat.om The Z ^ by ^ ^ „ as , he foUowin g 

the Z-transformation of the value p 

formula: z _ log , V(T+ r)/(l < 12 ' 15) 

,r we want to hnd the Z-.ransforma.ion of the value of we simply 

substitute p for r in Eq. (12.15). 

y .= log, V ( I F p)/( 1 — P) 1 

. in Fo (12.15) may look baffling to you. 
The mathematical symbol! ^ (12.15), but uses a table 

Virtually no one ever compu -Lrformed and tabulated for convenience 
instead; the calculations have b P r G ^ va|uc orZj is given for values 
in Table O in Appendix A in negative, simply give Z, a 

Of r from 0 to + U» » ^Corresponds to an r of .395. and that 
minus sign. Verify that a -c, z trans fdnnation is graphed in 

, = -.775 gives a Z r ol 

Fig. 12.13. build a sampling distribution of r by taking 

Suppose we set out to ofsize F „ from a bivariate normal popu- 

thousands of random samples « f fo _. each lMtead „f building 

lation with correlation P an ^ however , supp ose a frequency dis- 

up a frequency distnbuti How would such a frequency distribution 

tribution of the Z, s „ K z/s be nearly normal with 

look? The sampling d J sampling distribution would look like 

mean Z, and variance l/(» 3). 

,he distribution in Fig- ■ what is needed for a solution o the 

pro S;^ Sr^tervalsaroundr. The standard deviation of 



Fisher's 2 scole 



FIG. 12.13 Relationship between r and Fisher's ^-transformation of r. 
(Redrawn from Fig. 3-10 in Julian C. Stanley. Measurtmrnt In Today i 
Schools, 4th Ed . 12 1964, by permission of Prentice-Hall, Inc.. Englewood 
Cliffs. NJ.) 


Z T over repeated random samples is 1/Vn — 3, regardless of the value of p. 
Hence, 90% of the Z,’s obtained in repeated ran dom s amples will lie within 
1.64 standard deviations— a distance of l,64(l/ s/g~^~3) — of Z p \ 95% of the 
Z r ’s will lie within a distance of 1.96(1 /Vn — 3) of Z p \ etc. The distribution 
of Z, is approximately normal regardless of the siz e of n. Consequently, if 
we add and subtract some multiple of 1 /V n — 3 from Z, we will base a 
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specified probability of capturing Z, within these intervals. The following 
is another way of conceptualizing the problem. Since Z, is normally dis- 
tributed with mean Z, and standard deviation I/s/n - 3, then 

z, - z, 

!/>/» - 3 

is normally distributed with mean 0 and standard deviation 1 . Consequently, 


prob 


1.96 < 


Z r - 


it < 1.96 


prob £—2. 

Therefore, the interval 

Z r - 1.96 


l/yjn — 3 
, Z, - 

' 1/V« - 3 

=,Z. 1.96 


']-*• 




1 


s/n - 3’ “ r \ { n ~ 3 

captures Z, with probability .95; and the interval 


Z r - 2.58 


v n - 


-.Z. + 2.58-, 

3 vn — 3 


(12.16) 


(12.17) 


^ confidence interval w„h con- 

M£n Firsr^'lsfomed'int d o Z, by reference to Table O in Appendix A: 
Z.„ s - .230. 

Second, 1.96 times the standard error ofZ, is foundl 
1.96 


1.96- 


- = .218. 


9 

The 95% confidence interval on Z, is found by substituting into E q . (12.16). 
The lower and upper limits of the interval are 
1 


- L9 6-=t==.2S0-.2'8 = -«32 
y/n — 3 


and Z r + 1.96 . 

s/n — 3 


1 _ = .250 + .218 = .468, 


respectively. 



268 STATISTICAL INFERENCE! ESTIMATION 


CHAP. 12 


The interval (.032, .468) was generated by a process having a probability 
of .95 of producing an interval that captures the Z-transformation of p. 
We can interpret the confidence interval more intelligibly if everything is 
transformed back from Fisher Z scores into correlation coefficients. So we 
read Table G backwards and find the values ofr which correspond to Z scores 
of .032 and .468. An r of .032 corresponds to a Z or .032 and an r of .436 
corresponds to a Z of .468. Therefore , the 95% confidence interval around 
an r of .245 extends from .032 to .436. We feel quite confident that the value 
ot p, the population correlation coefiicicnt, is between .032 and .436. 

12.6 

CONCLUSION 


tta'STrfliS Cl r p '," h “ be '" prcs ™ ,he a m> a portion or 

important statistics will have to bedescrihpH tE l,n 6 d, . stnbu ‘ioiw of some 
problem of pl aci„g a co„E° e t,e “. f L*. 0 ,hc i "’ p °' la "> 

be shown. We shall al so see how it is possiMetf'i ^ * '* Unknown wiU 
around the difference between two samnle ? 3CC 3 co " fldcnce interval 
means of two separate g^o Up ^o^)bservat^on S,lma ^ S, 3nd < thc 

purpose or estimating the differencThJi ° n ^ variabIc )- for ‘he 
*■ <™ a technique^will pS e ”“ £* £ f” ^ a ” d 

questions about the superiority of oni* ™ i ° f ° r answcnn £ important 
almost every example p“e3 “ > K ' pulal '°" over another.) But for 
•s the same. (The only except to^TL tht basic rationale 
. r In Cha P“r 13 we introduce a second 1 ' l™ 1 in Chap,tr l6 '> 
mferent.al statistics: “hypothesis testin'"", , J branch ° r tbe subject or 
that has been widely applied in research’in d ° ' St ‘ Ca ' 'i 1 "" 11 '® 1 methodology 
™nees. We shall fi PP ',7 ” “ d ,hE behavioral 

h0 “Sb tbe close relationshipfa “ ^o™ !,° in,5rval '«i"> a li°n. In 

, h ' s “ dnu ° f statistics, interval obscured to the eyes of 

•rtually two sides or the ;am ' coil ma " 0n and 'W’lbesis testing are 



PROBLEMS AND EXERCISES 

By us j n g the table of random digits (Table A in Appendix A), draw a random 
sample of eight students from the following set of sixteen: 

John A1 Joan Phil 

Mary Tom Susan Paul 

Alice Maurice Martha Edith 

Bob Barbara Jack Warren 

2. In which of the following pairs of terms do the two terms in the pair stand for 

exactly the same thing? .. 

a. 1 : the standard error of X.; 2: the standard deviation of the random-sampling 

distribution of X , . 

c. ‘ 1 : ^ria^^rl^ofxTz: the variance of the sampling distribution of 

d. f: the population variance o*t 2: » times the variance error of* . 
e 1 : the mean of the sampling distribution of r*. 2- a x- 

, . I!n _ distribution of si is positively skewed with a mean ot 

3. In general, t e p g relationship between the mean and median in a 

• *•«- ^ 

exceeds a\ greater or less than .50? 

. r • '.tn he drawn from a population with mean 220 and variance 
error of X. for various sample sizes: 


a. 2 

b. 4 


d. 16 

e. 32 

f. 64 

g. 1000 

h. 2000 

-r -: 7e n are drawn from a normal population with mean /i and 

5. When samples of X is a*/n but the variance error of the sample 

variance® . e Suppose that for a particular problem, ft = 100, a 2 = 25, 

median Md is L57 / . PP ^ the variance error of J. is a \n - 

Tl? J f ' If one chose to estimate /. with Md, how large a sample would 
have .0 be taken so that the variance ciror of Md, a sl „ would also equa . 

i „r size n is to be drawn at random from a population with mean/' 

6. A sample of size « ^ sl « is sufficiently large that * can be assumed to 

haveTnomal sampling distribution. Determine the prubabilit.es with which 


269 



270 STATISTICAL INFERENCE: ESTIMATION 


CHAP. 12 


X. will be between the following pairs of points: 

a. /» + <r/ v /i and p — of^'n 

b. p + 1.64w/v^fland p — l.64a/''n 

c. )t + 2.58 <r/V« and - 2.58«r/ ' 7 n 

d. p 4- 0.67 So} 'f/t and p — 0.675c/ ' / n 

e. /i + 3c/ v ^n and /« — 3c/' rt 


7. A sample of 100 persons was randomly drawn from a population with variance 
16 but with an unknown mean ;t. The value of X, obtained was 106.75. Con- 
struct the 95% confidence interval on /« around X 

8. Construct the 95% and 99% confidence intervals on p for the following cases: 
a. n -28,r - +0.36 b. n =• 12. r = -0.65 c. n - 300, r ■= +0.14 

9. Which one of the following sample statistics has the smaller variance over random 
samples of size n (“variance error*’)? 


2f| + » , . 4- X„ 
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13.1 

INTRODUCTION 


UCTION , . „ 

, .... „, chn in UC known as interval estimation 

In Chapter 1 2 the statistical ‘" f « en " ' d JJ a . q Interval estimation ts just one 
was developed for a few examples ^ useful and important one, of the 

body ofstadsticannferent^l teel 111 *^^ ^ypP^gs^testhig. ^ Hypothesis 
research IS seldom ^ interval estimation or ^ J P rigid and 

the utilization state of affairs is p Y s who administer 

hypothesis testing- ^ by S ome an* 10 demanded that his 

uncritical *taodar P Qf the academic advi ^ a „ 5Q states wh ich 
research (as in th . f rent j a l techniques on d 

advisee use statistical m 27 l 
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exhausted the population of interest) and partly— the greater part, we hope — 
an indication of the genuine utility and necessity for employing such 
techniques. 


The number of new concepts to be introduced and comprehended in 
connection with hypothesis testing will make the discussion to follow a 
challenge. Fortunately, some of the concepts of interval estimation also 
play central roles in hypothesis testing. We shall find that the concepts of 
random samples, the sampling distributions of statistics, and probability 
values applying to statements are building blocks for hypothesis testing as 
well as for interval estimation. Hypothesis testing and interval estimation 
are earned out with different languages, but we shall see that they usually 
produce equivalent results or results that are easily converted from one 
technique to the other. The basic problem, however, remains “How does 
one infer properties of the population from observation of a sample?'' 


13.2 

SCIENTIFIC AND 
STATISTICAL HYPOTHESES 


'"'i" 8 b 'S an ib lh ' « r| y eighteenth 

in a publication dated 1710 a °d bTl' h'lkb 1 ’ 0 !!''"' 5 app “' S 

it is titled “An Areumem A- n by Joh ” Arb “ lbn °> (1667-1735); 
Regularity Observed in the BinhToMWhT’ '° m ,hs Ct ’" sla " t 

consecutive years the records B lh Scxcs ' Not,n g that for 82 

remales, Arbuthnot argued that the n . Umbcr of maIes born th an 

are equally likely (each with u . yp°thesis that male and female births 

for if the yc irii'S; ’ ity of ],2) was rcfutcd * *** ^ 

or 82 consecutive years in which mo r*** f rCCIScly then th e probability 
be infinitesimally small (1/2)” to h ^ CS tban ^ cma ^ cs were horn would 
greater proportion of male births * 'T' , Arbu,hnot included that the 
Providence; ,h= sacred institu^ „ or ” „ C ™ lc bir,hs ™ »« °r Divine 

males were more likely to be killed ™ s b ""B maintained since 

A.buihnoi’s s^tistics wercunimneachabA'^'h bcfore rcacb ing adulthood, 
fee polygamous societies. Like Arbulh ' ’ ,h “ l °W M ' d •» account 

concerned «, h thc probabilistic tn™," 01 ' ,h ' ' researcher is much 
A researcher is interested in Stud c nces of vanous hypotheses, 
and auaiety in fifth- and the relationship between creativity 

tody by Ohnmacht, 1966) His seamh Ve ' '”" re 'aample is based on a 
F°»p of persons believing rha ,^2,, ,h '. li, "a'“re has disclosed one 
than he uncreative ^ > b e creative thinker shouid be less envious 
'■eristics are not related in * er £ r °“P believing , ha , thc two char . 

ny way. Our researcher has not yet joined 
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either camp. He is undecided on this matter and intends to satisfy his 
curiosity with a small empirical study of his own. First he must decide 
how he will measure creativity and anxiety. Two tests that appear to have 
some validity for measuring the two characteristics are found: Getzeis and 
Jackson’s test “Uses of Things” is taken as a measure of creativity, and the 
“Children’s Manifest Anxiety Scale” by Castenada, McCandless, and 
Palermo is taken as the measure of anxiety. There are over 20,000 fifth- and 
sixth-grade pupils available for study, but the resources available to our 
researcher make it possible for him to observe only about 200 of them. 
Being an excellent student of statistics, our researcher plans to draw a 
random sample of 200 pupils from the population of 20,000 so that he can 
draw statistical inferential conclusions about the population based on his 
sample observations. Each of the 200 pupils in the sample will be given the 
Uses of Things test and the Children’s Manifest Anxiety Scale. The sample 
product-moment correlation coefficient between the measure of creativity 
and the measure of anxiety will be computed. Our researcher could proceed 
to establish a confidence interval around this sample r by the techniques 
outlined in Sec. 12.5. However, he has been trained in a somewhat different 
school of statistical thought, one in which decisions are paramount; this being 
so, he proceeds differently. 

The decision that the researcher will supposedly make is a decision about 
the truth or falsity of a statistical hypothesis. There are at least two types 
of hypotheses it would be well to identify and distinguish : scientific hypotheses 
and statistical hypotheses. A scientific hypothesis is a suggested solution to 
a problem. It is an intelligent, informed, and educated guess. A scientific 
hypothesis is generally stated as a proposition. “It is an empirical propo- 
sition in the sense that it is testable by experience; experience is relevant 
to the question as to whether or not the hypothesis is true. . .” (Braithwaite, 
1953). The formulation of a good scientific hypothesis is truly a creative 
act. On the other hand, a statistical hypothesis is merely a statement about 
an unknown parameter. For the next few pages we shall use H : (statement) 
to denote a statistical hypothesis. "H\ p = 125” is a statistical hypothesis; 
it is an assertion that the unknown mean of a particular population is 125. 
Clearly, such a statement is either true or false. The decision “if: fi = 125 
is false" is an example of the type of decision with which hypothesis testing 
is concerned, “if: p = 0, where p is the correlation coefficient in a bivariate 
normal distribution," is another example of a statistical hypothesis, “if: 
al = o|” is a statistical hypothesis stating that the variances of populations 
1 and 2 are equal. How would you denote the hypothesis that the means of 
populations 1, 2, and 3 are all equal to each other? 

It is important to distinguish scientific and statistical hypotheses. It is 
quite possible to test statistical hypotheses about very mundane matters that 
possess limited generality and not a whit of scientific importance. For 
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hypothesis tester, he views the statistical inferential problem of reasoning 
from the sample and r to the population and p as one of making a decision 
about a hypothesis that asserts p is a particular number. Partly out of habit, 
partly because of tradition, and partly because it is a sensible choice, this 
researcher establishes the hypothesis he wishes to test as H: p — 0, i.e., that 
the correlation between creativity and anxiety in the population is zero. 
This is his statistical hypothesis. On the basis of the observations he will 
make on a random sample of 200 pupils from the population , he will decide 
to accept his hypothesis as true or reject it as false. The techniques he will 
use to make a decision about the truth of the statistical hypothesis comprise 
what is called the hypothesis test. 

What constitutes a legitimate and rational test of the hypothesis II: p = 0 
in this situation? Should the researcher compute r for his sample of 200 and 
decide that H is true if r is zero and decide that H is false if r is not zero? 
Obviously not; we know too much about the erratic behavior of sample 
estimates to agree to such a plan. It is quite possible for p to equal 0 in the 
population and for r to be substantially different from 0 in a sample of 200. 
In fact, it is not even an impossibility that a sample of 200 from a population 
with p = 0 will yield an r of + 1 or — 1 ! It is extremely improbable, but it is 
certainly possible. This presents a perplexing problem. Even if p — 0 in the 
population, any value of r from —1 to +1 is a “possibility” in a random 
sample of 200. Consequently, regardless of the value of r for the sample of 
200, the researcher cannot with certainty conclude that p is or is not zero. 
This is an important principle that underlies all tests of statistical hypotheses, 
and we shall restate it: In testing any statistical hypothesis the researcher's 
decision that the hypothesis is true or that it is false is never made with certainty ; 
he always runs a risk of making an incorrect decision. The essence of statistical 
hypothesis testing is that it is a means of controlling and assessing that risk. 
As we elaborate the example introduced, we shall learn the rationale of 
controlling and assessing the risk of deciding incorrectly about the truth of 
a hypothesis. 

The next step after stating the hypothesis to be tested is to draw a sample 
from the population and make the observations that will reflect on the hy- 
pothesis. The researcher has drawn a random sample of 200 pupils, measured 
them on both the anxiety and creativity tests, and correlated the scores. 
The value of r for the sample was +.09. 

The uncertainty in making a decision about H: p ~ 0 arises from the 
phenomenon of sampling fluctuation, usually called sampling error. This 
is not a new idea to us in this text. Our entire discussion of inferential 
statistics has been concerned with the problem of how to handle estimation 
of a parameter by a sample value that is almost certainly in error to a greater 
or lesser degree. As before, we attack the inferential problems of hypothesis 
testing with the concept of a sampling distribution. 
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FIG. 13.1 Sampling distribution of 
r for samples of size 200 from a 
bivariate normal population with 


After stating the hypothesis, one must determine the sampling distri- 
bution of the estimator of the parameter about which the hypothesis is made. 
Furthermore, one finds that sampling distribution that would result if the 
hypothesis being tested were true. In our example, we must determine the 
sampling distribution of r for random samples or size 200 from a bivariate 
normal population in which p = 0. Fortunately, this has been done before 
and the result is a convenient one. The sampling distribution of r for 
samples of size 200 under t hese circu mstances is nearly normal, with mean 0 
and standard deviation l/V200^I = .071. 

Sa T lin S distribution of r when the hypothesis 
With tiTe actuaNrin. f ,S T? 5 distribution, when considered in connection 

knowing the entbe blvaria^'?^ P»sible"litlfom 

ien^questionis ‘^s'^reas^a^l^t* firsrr^'ts'an'r of'.OOT'^An'equiva* 

answer to the ,.£££ b H" E * '' n °" “ ,h ' 

to “plausible” and “reasonable"’ tV - h 1° We P lvc “ acl m ' ai "ng 
that in a certain locale it rain, on 90Y o"fc d '‘T*™ 1 '"- SupP ° Se 

wake up onemorningand announce for ' h d ,S ' ,S 11 "asonable” to 

it will rain? Yes. it probably is If „„ ,L "T" whatsoever that today 
of the days, it is not very reasonable I™ h ^ ba ." d 11 "tits on only 10% 
Ninety per cent makes it plausible to' day ' 
implausible. These are crude defm W .r .. ?" " Cry day; 10 °^ makes it 
but the arbitrary logic differs little fm .u 8 plausible" and “reasonable," 
he says it is unreasonable to assert at” 'u Wh!cb tbc statistician uses when 
rams only „„ 10%, 5 or , “ ™ d «"t that “today i, will rain" if ir 

at random it would S um'efsonabie”' h* 5 “t pr ° babi,ily of - 05 of occurring 
dm single Iria! of , hc e,e„, which ™ '™ plaU si blc “ aspect it to occur on 
probabiliiy of .10 or .01 o" w 0 ° h W ' C °" ld hav ' ch °™ 

»r .001. Our choice of .05 was arbitrary. It begins 
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FIG. 13.2 Sampling distribution of 
r for samples of size 200 from a 
bivariate normal population in which 
p = JO. 



to become reasonable to expect to observe an event on a single trial when the 
probability of the event increases above .10, say. 

If the hypothesis being tested, H: p = 0, is false, then p is either greater 
or less than 0. If p is not zero, we would expect to obtain a sample r which 
is somewhat above or somewhat below the major portion of the distribution 
in Fig, 13.1. For example, if p is .20, the sampling distribution in Fig. 13.2 
would result from repeated sampling of the population. 

Consequently if H: p — 0 is true, the sampling distribution in Fig. 13.1 
will hold. If JFf: p — 0 is false, the distribution of r will lie generally higher 
on the scale, if p is above 0, or generally lower on the scale, if p is below 0, 
than the distribution in Fig. 13.1. Hence, a very large value of r, e.g,, r = 
.50, is an unlikely event if the hypothesis H: p = 0 is true, but it is much 
more likely to happen if p is above zero, e.g., if p = .40 or .59. Similarly, 
a very small value of r, e.g., r = —.60, is unlikely to occur if p = 0 and 
somewhat more likely to occur if /> is below zero. For these reasons, the values 
of r that make the truth of H: p = 0 unlikely or implausible lie to the right 
of some point above zero and to the left of some point below zero in Fig. 13. 1 . 
Now we want to determine some such points exactly. 

The standard deviation <r T of the distribution in Fig. 13.1 is .071; 
thus, the probability that a value in the distribution exceeds (1.96)<r, = .140 
is .025. The probability that a value is below — . 1 40 is also .025. Therefore, 
the probability is .05 that a value sampled from the distribution in Fig. 13.1 
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will lie above .140 or below —.140. In less precise terms, it is unlikely to 
expect a value of r greater than .140 or less than —.140 to be sampled from 
the distribution in Fig. 13.3. 

In repeated random samples of size 200 from a population in which 
p = 0, a value of r above .140 or below —.140 will occur one time in twenty 
samples on the average, i.e., the probability is .05 that a random sample will 
produce an r greater than .140 or less than —.140. When we consider 
jointly the decisions out researcher can make about the hypothesis and the 
possible outcomes of selecting a sample and computing r, we find that there 
are four possibilities. Two possible decisions (after observing the data) 
are reasonable, and the other two are unreasonable, as the accompanying 
table shows. 


The sample produced an r which was 
Between -0.140 end 0.140 Above Q.I40 or below -0 140 


H was true 

1 The researcher's 

The researcher’s 

decision was reasonable 

decision was unreasonable 

The researcher 
decided that 

The researcher's 


H was false 

decision was 

decision was 


unreasonable 

reasonable 


Let us see what is implied when a researcher decides H: p *= 0 is true after 
an r on a sample of 200 is observed that lies either above .140 or below 
-.140. In the table we have called such a decision in the light of such 
evidence “unreasonable.” Why is it “unreasonable”? The decision is 
unreasonable because values of r deviating from 0 by more than .140 are not 
normally to be expected-they will occur in only 5% of all possible random 
samples—when H: p =* 0 is true. If one continues to maintain that the 
hypothesis H: p = 0 is true after observing an r of .30 for a sample of 200, 
he is forced to admit that the event observed, namely an r of .30 or larger, 
is relal'.dy rare. Naturally we expect to observe likely events, even though 
and Tr. 11 ! 0 °““ r ' „ But common S'"* ""s »s to expect likely events 
event as , lnrt V f . 0nes ' 0n sequently, ir we are forced to acknowledge an 
ewntMutt hkely m order to decide that Jf : , = 0 is true, wc fmd it f „ m0 „ 

■Mta ktwtam A more reasonable decision is that H'. , = 0 

samole V eithrr m* UtS ° l ^ cr lhan 0 niake it reasonable to expect a 

sample r either below -.140 or above .140. 

„ nSS )?’ f “«'«> “ <« decide H: p =. 0 is , ru, tfr for 

if Z btT m T ~ Ai0 md - 140 ' md '° *«de H: p - 0 b fihe 
Wffllue 7 - " ° b r M0 - We «-» o»r decision rule. 
possibility of ev " '' ad “ in, ° " ror? Most certainly, the 

heart of the problem of h Jtlc T re f t decision with this rule docs exist. The 
rule, and the as^li^r w “ teUnEis lh ' rorm “'ation °rsuch decision 
assessment of the probability that they will lead us into error. 
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FIG. 13.4 Representation of the area in the sampling distribution of r 
for samples of 200 when p = 0 which lies more than 1.26 standard deviation 
units from the mean. 


Suppose that in fact, but unknown to us, p is exactly 0. We have 
agreed to decide that H: p = 0 is false whenever an r outside the interval 
—.140 to .140 is obtained. In what percentage of an infinite number of 
random samples of size 200 will an r deviating from 0 by at least .140 be 
obtained when p is actually 0? In exactly 5 % of the samples. Consequently, 
if H: p = 0 is true, the decision rule we adopted would cause us to decide 
that H: p = 0 was false in 5 % of the samples— or with probability .05 — when 
in fact the hypothesis was true. 

The researcher investigating the relationship between creativity and 
anxiety found a correlation of .090 between the Uses of Things test and the 
Children’s Manifest Anxiety Scale in a random sample of size 200. This 
value of r lies 1.26 standard-deviation units (.090/.071 = 1.26) above 0. 
If H\ p ~ 0 is true, our researcher has drawn an r which lies 1.26 standard 
deviations from the mean of the sampling distribution of r (see Fig. 13.4). 
How often would one expect to obtain an r for a sample of 200 that lies 1.26 
standard deviations above or below the mean of this normal distribution? 
From Table B in Appendix A we see that 20.8% of the area under a normal 
curve lies more than 1.26 standard-deviation units from the mean. Therefore, 
a value of r deviating from 0 by at least .090 is to be expected in over 20% 
of the samples of size 200 from a population in which p = 0. It is not un- 
likely, then, to obtain an r of .090 from a population with p = 0. Conse- 
quently it would be unreasonable to conclude that H: p = 0 was false on 
the basis of an r of .090 in a sample of 200. 

13,4 

TYPE I ERROR, LEVEL OF 
SIGNIFICANCE, AND CRITICAL 
REGION 


In this section we shall summarize the points made in the preceding section 
and give the conventional names for many of the concepts presented there. 
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From the discussion in Sec. 13.3 we shall reconstruct four steps: 

Step 1: A hypothesis to be tested is stated. In our example 
that hypothesis was H: p = 0. It has long been 
customary to call the hypothesis to be tested the null 
hypothesis. This convention arises from the fact that 
statistical hypothesis testing procedures arose within 
a philosophy of science that conceived of its rote as 
gathering evidence in attempts to nullify hypotheses. 

We shall not use the term “null hypothesis" until later 
in this chapter, when it will be necessary to distinguish 
between different hypotheses. 

Step 2: Assumptions are made that are necessary for deter- 
mining the sampling distribution of the statistic that 
estimates the parameter about which something is 
hypothesized. The sampling distribution of this statis- 
tic is determined for the case in which the hypothesis of 
step 1 is true. 

Step 3: A degree of risk of incorrectly concluding on the basis 
of sample evidence that H is false is adopted. This 
risk, stated as a probability, is denoted by x and i$ 
called the lecei of significance of the hypothesis test 
(or, occasionally, the "size" of the test). This usage 
has made "significance test" synonymous with “hy- 
pothesis test." From the risk adopted, a set of values 
of the sample statistic is determined that will lead one 
to decide H is false if the sample yields such a value. 
This set of values is called the critical region. 

For example, in the illustration in Sec. 13.3 it was 
decided that a risk of .05 was acceptable. In other 
words, since no decision about p could be made with 
certainty, it was considered acceptable to make the 
probability espial to AS of deciding H: p = 0 was false 
when in Tact it was true. The level of significance a of 
the hypothesis test was thus taken to be ,05, 

By determining the point (.140) abrne which and 
the point (—.140) below which 2.5% of (he r's in 
repeated samples of size 200 from a population with 
p - 0 would fall, we found the two regions of “un- 
likely r’s giien a true hypothesis H: p = 0." These 
two regions, which constitute the most unlikely 5% 
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of the sample r’s one could obtain by sampling from a 
population with p — 0, are the critical regions. The 
critical regions for the illustration in Sec. 13.3 are 
indicated in Figure 13.3. One portion of the critical 
region lies from .140 to +1.00, and the second portion 
lies from —.140 to — 1.00. A critical region is some- 
times called a region of rejection because the occur- 
rence of a sample value that lies in the critical region 
leads one to reject the hypothesis N: p = 0. 

Step 4: A single sample is drawn from the population, the 
value of a statistic is observed, and a decision about 
the truth of H is made. This is the final step in the 
testing of the hypothesis stated in step 1. 

The sample data must lead us to make one of two decisions about H : 
“ H is true” or “ H is false.” The former decision is spoken of as “accepting 
//,” the latter as “rejecting H." From any sample it can never be concluded 
with certainty that H \ p = 0 is true or false; the best one can do is to make a 
decision about H that has a high probability of being true. 

If H is true and our sample leads us to accept H, a correct decision is 
made. If H is true and our sample leads us to reject //, an incorrect decision 
is made. Such an incorrect decision is called a type I error or error of the 
first kind. (Later we shall meet a type II error.) A type I error is made 
when a true hypothesis H is rejected. It is, of course, impossible for a 
person to know whether his decision to reject H is correct or a type I error. 
To know this, it is necessary to know whether H is true or false; but if the 
truth is known about H, there is no need for inferential statistics. At best, 
one knows the probability— or proportion of times in the long run — of 
making a correct decision or a type J error. 

In the hypothesis test of H : p — 0, a decision process was set up that 
would cause one to reject //erroneously five times in 100 — or with probability 
,05 — in a long series of similar hypothesis-testing situations if H were true. 
Hence, it is known that if H were true a type I error would occur with 
probability .05 if any sample r outside the interval —.140 to . 140 was regarded 
as evidence that H: p — 0 was false. If H is indeed true, what would be the 
probability of making a correct decision about it, i.e., accepting it? Since 
under these circumstances 95% of the r’s for samples of size 200 will fall 
between —.140 and .140, the probability of accepting H when it is true is .95. 

The size of the probability of a type I error can be controlled. We shall 
denote the probability of a type I error for any unspecified hypothesis-testing 
situation by a. Jn the test of H: p = 0, a was set equal to .05. We can 
make a equal to such values as .20, .10, .01, .001, or even .125, if we choose. 
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FIG. 13.5 The critical region for testing the hypothesis that p =■ 0 
with an * of .01(» ** 200). 


Since a stands for the probability of making a certain type of incorrect 
decision, we would prefer to keep it small. It would probably rarely be 
acceptable to decide to accept or reject H with a plan that would commit a 
typel error with probability much greater than .10, t.e., with a greater than 
.10. It is customary to let a equal .05, .01 , or .001 . We shall now see how 
■we could have tested Hi p = 0 with an a of .01. 

If p = 0, then the distribution of r for samples of size 200 is approxi- 
mately normal with mean 0 and standard deviation .071, as we saw before. 
Both “large" and “small" values of r will cause us to question the truth 
of the assertion that p — 0. So that we shall run a risk of exactly .01 of 
rejecting Hi p = 0 if it is true, we must determine the two numbers between 
—1 and +1 that are exceeded by only 1 % of the sampling distribution of r 
when p — 0. These two halves of the critical region are depicted in Fig. 13.5. 

It can be determined from the table of the unit normal distribution that 
the probability is .005 that a normally distributed variable will lie more than 
2.58 standard deviations above the mean. Similarly, the probability is .005 
that a normally distributed variable will lie more than 2.58 standard de- 
viations below the mean. Hence, if we establish the critical region from 
(2.5&K.071) = .184 to 1.00 and from -1.00 to (—2.58) (.071) = —.184, we 
shall have a probability of only .01 of rejecting Hi p = 0 if it is true. 

It might be instructive to some who read this to see the above argument 
in its mathematical form. If Hi p = 0 is true, then 


r — 0 r 

o T .071 


N(0, 1). 


where A'fO, 1)" means “is distributed normally, with population mean ( 
and population variance 1.” Therefore, 


prob (—2,58 <~- { < 2.58) = .99. 
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Multiplying the inequality in parentheses by .07] gives the following 
expression: 

prob (—.184 < r < .184) = .99. 

Consequently, if the correlation coefficient for a sample of size 200 from 
a bivariate normal distribution is above .184 or below —.184, the null 
hypothesis Hx p = 0 can be rejected at the .01 level of significance. Suppose 
that for a sample of size 200, the value of r is .340. The obvious decision 
can be stated in several equivalent ways: 

1. “Reject Hx p — Q at the .01 level of significance.” 

2. “Reject H : p = 0 with an a of .01.” 

3. “Reject H: p —Q at the 1 % level of significance.” 

4. “Reject Hx p = 0 with a probability of .01 of making a type I 
error.” 

The importance of the assumption that the sample comes from a bivariate 
normal distribution has been investigated by various researchers: Norris and 
Hjelm (1961), Nefzger and Drasgow (1957), Binder (1959), Furfey (1958), 
LaForge (1958), and Milholland (1958). If you read one of these references, 
you would be well ad vised to read them all since no single reference on this 
topic presents a balanced picture. 


13.5 

TYPE (I ERROR, (i, AND 
POWER 


Quite literally, so far in this chapter we have related only half the story of 
statistical hypothesis testing. In this section, we shall present the rest of 
our account. 

The standard technique for testing the hypothesis H: p = 0 is to select 
a level of significance a, determine the critical values of r or z = (r — 0)/oy 
(whichever you wish), draw a sample and compute r, and then accept or 
reject H. In the previous section, we showed how to measure the probability 
that H would be rejected when it was in fact true, i.e., the probability of a 
type I error. It was acknowledged that the decision “H is false” could be 
incorrect. Now we acknowledge that the decision to accept H, i.e., to 
conclude that “H is true,” could also be incorrect. In other words, we could 
falsely accept H, e.g., conclude that p = 0 when in fact p = .20. The error 
of accepting a false H is termed an error of the second kind or a type II error. 
Having acknowledged the possibility of committing a type II error, we now 
proceed to a discussion of the techniques for measuring the probability of 
an error of the second kind. 
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If the hypothesis H: p ® 0 is false, some other hypothesis-an alternative 
hypothesis-about the value of p must be true. We shall now make use of 
the term null hypothesis for the original hypothesis p « 0 and the term 
alternative hypothesis to describe some other hypothesis we could make about 
the value of p, e.g., that p = . 20 . Henceforth, the null hypothesis will be 
denoted by H 0 and the alternative hypothesis by H x . We can seldom specify 
a single alternative value of the parameter. Generally (though not always) the 
alternative hypothesis is composite, i.e., it specifies many possible values of 
the parameter, instead of simple , like the null hypothesis, in which 3 single 
value is hypothesized. The following illustrate a simple null hypothesis and 
a composite alternative: 

null hypothesis: H 0 : p = 0; 
alternative hypothesis: //,: p # 0. 

In the theory of hypothesis testing, it is held that one of two “states 
of nature” may exist: either if, is true or H x is true; and it is agreed that after 
inspecting a sample, one of two decisions will be reached: //„ will be accepted 
(hence, Hi is rejected), ot H x will be accepted (hence, Jf 0 is rejected). The 
four possible combinations of these states of nature and decisions are 
illustrated here along with a description of the validity of the decision. 


State of nature 
Hq is true is true 


Reject H 0 

Type I error 

Correct decision 

(Accept «,) 

(Probability = a) 

[Probability = 1 -£> 

Reject H, 

Correct decision 

Type H error 

(Accept « 0 ) 

(Probability = W 

(Probobility=£) 


Wc have adopted the convention of calling the probability of com- 
mi ing a type I error a. The probability of committing a type II error, i.e., 

.{"“2 St ! Vhen tt Hl IT' WiH be den0ted b >’ P- We w ' lU now lo °k at 
an example of how /? would be calculated. 

Also*assume that n? ’” vest, S ator wls h cs to test the null hypothesis H a : p = 0. 
‘TTS • ' has no particular reason <0 suppose that p 

to d°L , s»mo Vr -1 °Z m '° va '“ . ,ha " ^ oth ' r ' Ho can alTori 
variables observed ha °°Pf rs ° ns and H « reasonable to assume that the 

b,var, t r rma ! distribu,i '’"' 

hence, a « .05 3 8 p = 0 when 11 ,s ^ue only five times per 100; 
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FIG. 13.6 Illustration of the power of the test of H 0 : p = 0 against 
Hi", p ¥= 0 for Ihe case in which p — .20 (n = 200 and a = .05). 


The investigator would establish critical regions of r that lead to the 
rejection of Jf 0 as was done in Fig. 13.3. A sample r of .140 or larger or 
—.140 or smaller would be taken as evidence of the falsity of H 0 . We know, 
then, that if p is actually zero, there is one chance in twenty (a = .05) that 
the investigator will reject H 0 . 

However, what if p is really .20? In this case, H 0 : p = 0 should be 
rejected in favor of the conclusion that p is different from zero. But what is 
the probability that H 0 will be rejected? This probability is the power of 
the test p = .20, and is depicted by the shaded area in Fig. 13.6. 

The upper critical region for r is all values from .140 to 1.00. Hence, 
the power of the hypothesis test to reject when p — .20 is the area above 
.140 under the curve that represents the sampling distribution of r for samples 
of size 200 when p = .20. This area is approximately 82 % of the total area 
under the curve on the right in Fig. 13.6. Thus the power is approximately 
.82. [Actually there also exists an infinitesimal chance that H 0 will be rejected 
in favor of H x when p = .20 due to a sample r below —.140! We disregard 
this improbability here; however, see Kaiser (I960).] The area under the 
curve on the right in Fig. 13.6 (the sampling distribution of r when p — .20) 
below .140 is a measure of the probability that r will fail to exceed the critical 
value even though H 0 is false; this area measures 0, the probability of a 
type II error. The area in question is about 1 8 % of the total area under the 
curve. Hence, 0 is approximately .18. Since if p is actually .20 we must 
either commit a type II error or not, the probability of not committing the 
error, i.e., the power of the test, is given by 1 — 0 = .82. Now try to 
convince yourself that if p were equal to —.20, the same hypothesis-testing 
procedure would run the same risk of a type II error and have the same power, 
.82, as when p = .20. 

It will further extend the notions being developed here if we determine 
the power of the test of H 0 \ p = 0 with a — .05 and n = 200 when p — .10 
instead of .20. The critical regions of the test remain the same: —1.00 to 
—.140 and .140 to 1.00. The sampling distribution of r for samples of size 
200 is unchanged from previous discussions, and it appears along with the 
sampling distribution of r for n = 200 when p = .10 in Fig. 13.7. 
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FIG. 13.7 Illustration of the power of the test of W,: p = 0 against 
p&O for the ease in which p = .10 (n *= 200 and a = .05). 


Suppose one had chosen to test H 0 : p = 0 against H t : p ^ 0 with an 
a of .10 and a sample of 200 paired observations. By referring to Fig. 13.6, 
try to determine whether the power of this test is greater or less than the 
test with a = .05 when p = .20. 

From an exact measure of the area under the curve on the right above 
the critical value of .140 in Fig. 13.7 it can be shown that the power of the 
test of H t : p = 0 when p = .10 is .29. Of course it follows that ft = .71. 

It seems almost never to be the case in research in education and psychol- 
ogy that the power or a hypothesis test for just one or two alternative 
values of the parameter is sufficient information. Generally one would want 
IT" 0f ,ht 1Bt for s ' v ' ral alternalive values of the 

“ ' S') lhK ' ™'“' ! ° r «« P°™ against the values of the 
parameter, and then connect the points with a smooth line The resultina 

' Note i„ g Fi g , ,3.8* Z tZiZSQ ^ “t 
^ ™ calculated to be in F^igs. 

as the SLr ° r r ,h ' ““ “ — * , 

contingency not under the control of th • S COI ? rort,n g to know, but it is a 
•he true value of p. However 1 »' »[«"• '"^Eator since he does not •‘set” 
% are under h *s control, relatively of significance 

of p other than zero, the poner S 1 Cast ‘ For an y given value 

increased (e.g from 1 0 to 100) and al* = 0 '"""eases as n is 

•o .05, say). I00) °" d «*» '""eases as a is increased (from .01 

The following can be said about hypothesis-testing procedures in genera.: 

’■ 'SSESrJt*****. P - .40, the 


P°wcr of the test of H, 


. increases as n, the sample size, increases! 
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2. For a given value of the parameter being tested, e.g., p = .40, the 
power of the test of ff 0 increases as a, the probability of rejecting a 
true null hypothesis, increases, e.g., from .01 to .05. 

These two relationships are quite important since to some extent a 
and n can be controlled by the investigator. It might be advisable in some 
circumstances to run a risk of a type \ error as latge as .10, i.e„ a = .10, 
to insure a reasonable power for a test. The third relationship we shall state 
is much less under the control of the investigator; 

3. For fixed values of a and n , the power of the test of /f 0 increases 
as the true value of the parameter being tested deviates further from 
the value hypothesized for it in H 0 . For example, if n = 100 and 
a = .01, the power of the test of //„: p = 0 is greater when p 
actually equals .60 than when p equals .40 or ~.40. 

The popular notion among practicing researchers is that the statistician 
is a man who tells them “how large a sample to take.” Presumably he derives 
this decision about sample size from studying cost per observation, costs of 
committing type I and type II errors, and the power of the test for different 
sample sizes and particular alternative values of the parameter about which 
a hypothesis is to be tested. The theory — known as the Neyman-Pearson 
hypothesis-testing theory— that gave us the notions of type II errors and power 
is very accommodating when these costs and specific alternative values of 
the parameter can be specified. However, in research in education and the 
social sciences it is rare that they can be specified with any confidence. We 
suspect that most statisticians consulting with researchers in these disciplines 
have had experiences similar to ours. We usually find ourselves advising 



FIG. 13.8 Power curve for the test of H„: p = 0 against H x ; p¥=0 
for it - 200 and a = .05. 
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persons to draw the largest sample they can aflbrd to take ; then we determine 
whether the sample size they tell us they are capable of taking is unnecessarily 
large! If they can take a sample so large that the power of their test of H 0 is 
.97, say, even when the true value of the parameter is only slightly different 
from the value specified in then they can be safely advised not to draw 
such a wastefully large sample. It may well be true that the power of their 
test would drop to only .90 if they took a sample half as large as the largest 
possible sample. If so, we would not hesitate to advise them to reduce the 
size of their sample. Thus, as our inquiries in education and the social 
sciences are presently constituted, the concept of the power of a test is more 
useful as a signal that too large a sample may be drawn than it is useful as 
the determiner of sample size.” 

13.6 

NON DIRECTIONAL AND 

DIRECTIONAL ALTERNATIVES: 

“TWO-TAILED VS. ONE-TAILED 

TESTS” 


C, " b l d " iEn, " d - *■» or 

him p is positive or that will lead hfm to, CV ‘ d . Cnce that Wl!1 cllh er convince 

One consequence ofTtat nfthJT ? M, ™ e *° bc,ievc ' is *«»• 

nul! hypothesis is that now thf JriticalTc^on J ltern ? l,V 5 H ' : P> 0 t0 the 
Of //.IS that salue of , exceeded by 100(!)V 0r ' CJCCt,on . in favor 
distribution of r when p — o In ml. ' * ° thc area in 'h* 1 sampling 
-II lead one ,o deeid? i„ ft.o ^ ^ Va, “' 5 ° r 

rrjccon of II, i, in ihc riphl-hand Ji° r ,h ' “““I for 

P =' O.aj indicated in Fie ? I3 9 i n p: dc * am P Iln g distribution or r for 

•into 1.00. n i3.9 the critical region extends from 

^ very small value of r sav - an 

that p > o ovcr lhe h)Tol ’J s |h ^* C " ,a,nl y do « not favor the hypothesis 
5" 0 or p > o are cosered by the hvnmia SmCC ° n ^ ,hc IWO con <htions 
lh ' irath of mm , ha „ 0 7 / f **"“5«*- « ' or -40 uoold imply 
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0.117 


FIG. 23.9 Illustration of critical region for testing H a : p > 0 against 
//,: p > 0 for n = 200 and a = .05. 

tribution of the statistic under the null hypothesis has made popular the 
phrase one-tailed test for a significance test of a directional hypothesis. This 
usage is somewhat ambiguous and potentially misleading. The important 
distinction is between nondirectional and directional alternative hypotheses; 
whether a test statistic has grown out of the history of statistics in such a 
manner that one or two tails of a sampling distribution lie above the critical 
values of a statistic is quite arbitrary. We shall see in Chapter 15, for ex- 
ample, that a nondirectional hypothesis about a set of population means is 
tested by referring a test statistic to one tail of the /"-distribution. 

One sometimes reads that the hypotheses tested in a directional situation 
are H 0 : p < 0 against H x : p > 0. It is then maintained that sufficiently 
large sample r's favor H x , and small or large negative r's favor H 0 . Actually, 
logic does not support this particular statement of H 0 and H x . H 0 must 
assert that p — 0, not that p < 0. Anything else leads to some interesting 
inconsistencies. Note that /f 0 : p > 0 and H x : p < 0 are every bit as accept- 
able as H 0 : p < 0and// 2 : p > 0. However, for fixed wand a, the probability 
of deciding in favor of H 0 : p > 0 when p is actually .10 is far greater than 
the probability of deciding in favor of H x \ p > 0. It is indeed an unaccept- 
able state of affairs when the power of a test to decide that p is above zero 
when p is .10 depends greatly on the arbitrary choice between H 0 : p > 0, 
H x : p < 0, or //„: p < 0, H x . p > 0. (See Rozeboom, 1960.) 

A debate over the merits of testing directional versus nondirectional 
hypotheses in research in the behavioral sciences raged for a few years in 
the 1950’s and 1960’s. We shall not take the space to recapitulate the issues 
here. You will find the major issues presented in the following references: 
Burke (1953, 1954); Goldfried (1959); Hick (1952); Jones (1952, 1954); 
Kimmel (1957); Marks (1951, 1953); and Peizer (1967). 

Our impression is that the potential for misusing directional hypothesis 
tests is great. To be perfectly legitimate, for example, one who hypothesizes 
that p = 0 against p > 0 must look the other way and refuse to budge from 
the belief that p is zero even if a sample of 1000 yields an r of —.99. We do 
not trust ourselves to be quite that orthodox, and we would not be astonished 
to observe others yielding to temptation. 
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Among recent attempts to clear away the practicing researcher’s confusions 
about statistical hypothesis testing, the most successful in our opinion is 
William Kxuskal's contribution to the International Encyclopedia of the Social 
Sciences (1968) entitled “Tests of Significance.” You could probably 
profitably read Kraskal’s article when you have finished studying this chapter 
and Chapter 14. Also see Nunnally (1960), Grant (1962), and Wilson and 
Miller (1964). 
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I. Give definitions of each of the following: 

a. Null hypothesis, 

b. Alternative hypothesis, //,. 

c. Type I error. 

d. Type II error. 

e. Level of significance, a. 

f. Power of a test, I - 

g. Critical region. 


2. In which of the following instances can If, 
m which instances is the reported sample 
— 10, r — .60 
b. n « 100. r - 1.00 

c- n - 1000, r - .50 


P “ 0 be rejected with certainty, i.e., 
ran impossibility given that P is zero? 


* ' ,pe ' *"”■ a ‘H* 1 

it. ;r, 


True value of , Reremhee', W on e 


c P 
d , . 


p -f 0 


Reject ff a 
Reject Jf B 
Do not reject H 9 
Do not reject //„ 


the .01 level of significance The samol^”' ,W ’ c l v,l,h a «mpl c of n = 50 
«rcted in favor of Jf \vha\ fc ,h c ‘ ufnc,eml y ^rge that //, was 

m "" J? C.n Le <Z” T,Zte^2 T * '«* » *"» »» com- 

4 '—to „ . T " m0 ' ” ■*« 

IfcnLn.cor^.K ‘"’’Tt'”"’ ’'° m * 

J,h *' ''' - 0.tWv.i]lbcd»t n bmrf 
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approximately normally with a mean of zero and a standard deviation of .071. 
Further, he decides to reject H 0 : p — 0 if r is above .10 or below —.10. What is 
the probability that he will commit a type I error? (Hint: What percent of the 
area under a normal curve with mean zero and standard deviation .071 lies 
above .10 and below -.10?) 

6. In each of the following instances indicate whether the critical region for rejection 
of H 0 lies in the upper (right) tail, lower (left) tail, or is divided between both 
tails of the sampling distribution of r for p = 0; 

a. /f 0 : p =* 0, Hi", p # 0 

b. /f 0 : p -0,/fj: p >0 

c. H 0 : p =0, Hi: p <0 

7. Researcher Rowe is testing H„: p = 0 at the « =* .05 level with a sample of 
size n ■= 25, and he sets his critical values of r appropriately. Researcher Null 
is testing H 0 : p =* 0 at the a = .05 level with a sample of n — 100 and he sets 
his critical values of r appropriately. 

a. Does Rowe or Null have the larger probability of committing a type I error, 
or is this probability the same for both? 

b. If p is actually .10, does Rowe or Null have the larger probability, p, of 
committing a type II enor? 

c. Which researcher is performing a significance test that has greater power to 
reject Jf 0 if p «= -.20? 

8. From Fig. 13.8, estimate the power of the test of H 0 : p = 0 against Hi", p ^ 0 
for n = 200 and « => .05 when: 

a. P = .05 b. p = .25 c. p = -.25 d. P =» .40 

9. Refer to Fig. 13.7. Suppose that in testing H 0 : p = 0 against H x : p with a 
sample of n = 200, the investigator sets his critical values at r <= -.30 and below 
and r = .30 and above. Thus he is adopting an extremely small a. If P is 
actually . 10, approximately how large is Ihe probability that he will commit a 
type II error, i.e., accept H 0 even though it is false? 
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14.1 

INTRODUCTION 

At one time or another someone has studied the inferential properties of 
almost every statistic known. Using either strictly mathematical techniques 
or, occasionally, empirical methods, statisticians have derived sampling 
distnbutions to fill most practical needs. In this chapter, the inferential 
properties or only the more frequently used statistics will be presented; thus 
the word “selected'* in the title of this chapter is appropriate. Where possible, 
the techniques of both testing the significance of a statistic and constructing 
a confidence interval around it will be given. The statistics with which we 
shall deal fall into four classes: means, variances, correlation coefficients, 
and frequency data. 

The discussion of the inferential properties of each statistic will take the 
following form: (a) statement of the null hypothesis and the alternative 
hypothesis l/,— the alternative hypothesis will always be “nondircctional” 
so that modifications of critical values are necessary tn the event a “one-sided" 
test is desired; (b) statement of the assumptions made in making the test; 

in 


\ 
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(c) identification of the sample statistic employed in testing H„ and H x \ 

(d) statement of the sampling distribution of the test statistic under both H 0 
and Hi\ (e) determination of critical values of the test; (f) construction of 
confidence intervals around the sample statistic; (g) an illustration; (h) 
special considerations, if any. 

The first class of hypotheses we shall consider deals with population 
means. 


14.2 

INFERENCES ABOUT THE 
MEAN, h, OF A POPULATION 


a. The hypothesis to be tested is that the mean }i of a population is 
equal to some real number a. The alternative hypothesis is, of course, that 
H is different from a: 

H 0 : n — a 
Hi: fi=£ a. 

b. It is assumed that the variable X has a normal distribution in the 
population sampled. One need not know the value of o\. 


c. H 0 is tested by means of the test statistic 


where 



v n - 1 


(14.1) 


d. If H 0 : n = a is true, / in Eq. (14.1) has the Student’s /-distribution 
with n — 1 df (degrees of freedom). When is true, i.e., when n is actually 
equal to some value b that is different from a, then the sampling distribution 
of t in Eq. (14.1) has the shape and variability of Student’s r-distribution 
with n — I df but it has a mean approximately equal to (b — a)!(<r x l\fn). 
For example, the distribution of t — (X. — 0)l(sjy/n) is depicted in Fig. 14.1 
both for the case in which n = 0 and fi — 2; n and a* are both equal to 30. 

e. The critical values for testing H 0 at the a-level of significance with the 

test statistic t = (X. — are the 100[1 — (a/2)] percentile point in 

Student’s /-distribution with n — \ df and the negative of this percentile 
point: 
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FIG. 14.1 Sampling distributions of f * (#. - 0)JU,/V/J> for the case 
in which H». ;i = 0 is true and the case in which H,: /' = 2 is true. 
The values of n and are both 30 (ot = .05). 


A value of t = (JP. - a)l(s r lJn) falling below the negative critical value 
or above the positive critical value constitutes evidence for rejecting H„: 
li — a in favor of H x ’. 

f. The 100(1 — a )% confidence interval on / 1 is constructed as follows: 

*.±, 4 ;- (»«> 

g. In the 1930’s a study was performed on the effects on intelligence of 
placing illegitimate children bom of average mothers into foster homes. 
The mean IQ (as measured by the Kuhlmann revision of Binet’s tests) of some 
175 children so placed between the ages of 6 months and 1 year was about 
115. Critics raised the question of whether the mean score in the population 
of all infants is actually 100 as was presumed. (Studies revealed the popu- 
lation mean (i to be substantially above 100.) 

We wish to test the hypothesis H 0 that the mean Kuhlmann IQ in the 
population of all infants in the United States is 100 against the alternative 
hypothesis that it is not: 

Ho‘. fi =* 100 100. 


The probability of falsely rejecting H 0 will be set at .01. 

A random sample of 25 infants yielded Kuhlmann IQ’s with X = 113.64 
and j, = 12.40, Thus, 

1 _nw i - 1 oo _ 550 

12.40/V25 


The critical values Tot t are - m r„ = - 2.797 and „-l„ = 2 . 797 . 
Hence, vie see that H, may be rejected in favor of H, at the .01 level of 
significance. Using Eq. ( 14 . 2 ) »e find the 99 % confidence interval on ,, is 


113.64 ± 2.797 1342 
V25 


= 113.64 ± 6.94 = (106.70, 120.58). 
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h. Although the assumption of a normal parent population is made in 
testing hypotheses about ft when a - is not known, violations of the assumption 
have little efTect upon either the level of significance or the power of the two- 
tailed Mcst (see Srivastava, 1958). However, nonnormality of the population 
sampled can have serious effects on one-tailed /-tests of directional hypotheses. 

Convenient tables for the calculation of the power of the /-test can be 
found in Wine (1964, pp. 254-60). 


14.3 

INFERENCES ABOUT ^ 
USING INDEPENDENT 
SAMPLES 


a. The hypothesis tested is that the difference between the means of 
two populations, fi t — is equal to zero against the alternative hypothesis 
that it is different from zero: 

II Q : fix — ft 2 = 0 

Hi — ^ 0 . 


b. It is assumed that X x is normally distributed with mean fi x and 
variance o\ and that X 2 is normally distributed with mean ft 2 and the same 
variance a j. The assumption of equal variances in the two populations is 
referred to as the assumption of homogeneous variances or homoscedasticity 
(literally, “same spread”). Furthermore, it is assumed that a sample of 
size n x is randomly drawn from population 1 and that an independent sample 
of size n 2 is randomly drawn from population 2. 

The major consequence of this assumption of independent samples is 
that the two sample means, X ml and X, t , will be perfectly uncorrelated across 
infinitely many pairs of samples. The independence assumption would be 
violated if, for example, sample 1 was a random sample of 10-year-old boys 
and sample 2 was composed of their sisters. The two means of brother-sister 
paired samples would correlate on most variables one might observe. 


c. H 0 is tested against H x by means of the following test statistic: 


X A - 


l (n, — l)s? + (w 8 -- i)s\ ( 1 


n x + n t - 


\”l ”2/ 


(14.3) 


where X A and X. 2 are the means of the samples from populations 1 and 2, 
respectively, 

s i an( j s j are the unbiased estimates from samples 1 and 2 of the 
common population variance <r*, and 
n 1 and n 2 are the sizes of samples 1 and 2. 
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The critical values against which a t of 2.34 are compared are 
.975 f «s — —2.01 and =* 2.01. 

Hence we see that //„: p, — p 2 = 0 can be rejected at the .05 level of signifi- 
cance. Indeed, a value of t deviating from 0 by more than 2.34 units has a 
probability p of approximately .03 if H 0 is true. 

The 95% confidence interval on p x — p 2 can be constructed from Eq. 
(14.4): 

(X. t - X,) ± = 1.65 ± 2.01 (.705) = (0.23, 3.07). 

h. Violation of the assumption of normality in the /-test of H 0 : ju 2 — 
p 2 — 0 has been shown to have only trivial effects on the level of significance 
and power of the test and hence should be no cause for concern (Boneau, 
1960; Scheffe, 1959, chap. 10). 

The effects of violation of the homogeneous variances assumption can 
be serious depending upon n, and n 2 . If n t and n 2 are equal, violation of the 
homogeneous variances assumption is unimportant and need not concern us 
(Box, 1954a, b; Scheffe, 1959, chap. 10). This fact is a compelling motive 
for selecting samples equal in size when possible in using the technique in 
this section. Whenever the variances of populations 1 and 2 are different 
and n 2 and n 2 are not equal, probabilities of type I and type II errors may be 
quite different from what one imagines them to be (see Sec. 15.13). When a 
study in which p 2 — p 2 is to be estimated cannot be designed so that n 2 = n it 
and one suspects that the two populations have substantially different 
variances, recourse should be made to methods developed by Welch (1937) 
or Gronow (1951). The problem of testing the significance of the difference 
between two means when the population variances are unequal has been 
referred to as the Behrens-Fisher problem (see Fisher, 19 59, pp. 93-97). 


14.4 

INFERENCES ABOUT /r, - p 2 
USING DEPENDENT SAMPLES 

a. Population 1 has mean p 2 and population 2 has mean p s . The null 
hypothesis to be tested is the same as in Sec. 14.3: 

H 0 : Pi ~ Pt = 0 
Hi- Pi — Pi ¥= 0 * 

b. It is assumed that samples 1 and 2 are randomly drawn from normal 
populations with the same variance o\. In this instance the samples need 
not be independent, i.e., there may exist a correlation between X A and X i2 
over repeated pairs of samples. The following are examples of dependent 
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samples: sample 1 is a sample of one-year-old infants and sample 2 consists 
of the fraternal twin-mates of the children in sample 1 ; sample I is a group 
of boys and sample 2 is the group of their sisters; sample I is the collec- 
tion of scores on a reaction-time test made by a group of persons before 
administration of a drug and sample 2 is the collection of scores made by the 
same persons after taking the drug. 


c. It will always be possible to “pair off” data from two dependent 
samples. The pairs may be defined by “brother-sister,” “before-after,” 
“twin I , twin 2,” “matched partner I , matched partner 2,” etc. Hence, data 
gathered from dependent samples will be in the form of n pairs of obser- 
vations X fl and X a for l — 1 , . . . , n. This pairing of the data from de- 
pendent samples will be used to test the hypothesis that ft x — = 0. The 

hypothesis that X x and X x have the same mean, i.e., that is equivalent 

to the hypothesis that A*, — X t has a mean of 0 in the population. The 
difference X{ — X* between the normally distributed variables X* and X t is 
itself normally distributed; thus the techniques in Sec. 14.2 can be employed 
to test the hypothesis that the n differences X n — X ts can be considered a 
random sample from a normally distributed population with mean — ft* 
equal to zero. 

Denote X„ — X n by d„ the difference between the paired observations 
from samples 1 and 2. The test statistic is 



where 

1 . - i<X,i - *«V» - i JJn, 
the average of the n difference scores, and 

, /pZS. 

\ ,-i n — 1 

Hf ■” <,r,he ’" li,r '™“ A-., - and n is thenomber 

of pairs of observations. 


^ Eq. (14.5) will follow the Student': 

, 11 }‘S ir/, itft ~ ft^Oistrue.thenlinEq. (14.5 

n ,n !topC ,h ' Sl “ d '™' s l-distribution will 

ten, on and by an atnoont depending panially on the size of ft - ft. 
4. The ait, cal .aloes for testing n. against //, at the e-level of signif, 
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cance by means of the /-statistic in Eq. (14.5) are as follows: 

i— i and i_fi/2j/ n — !• 

f. The 100(1 — a)% confidence interval on around d is con- 

structed as follows: 


yjn 


(14,6) 


g. Webster and Bereiter (1963) reported data on personality changes in 
100 college women from the Freshman to Senior year. A 60-item personality 
scale was administered in the Freshman and Senior years to the same 100 
women. The first set of 100 scores constitutes sample 1 ; the 100 Senior-year 
scores constitute sample 2. There are 100 pairs of “before-after” scores. 
From these data IDO difference scores are formed by subtracting Senior 
scores from Freshman scores: d f = X fl — X t2 . The mean and standard 
deviation of these 100 d scores are: 




Xn — X f2 


-7.02. 


/ 10 
l 


2 W, -<V 


99 

We shall test H 0 : fi x — = 0 at the .01 level of significance. The value 

of / in Eq. (14.5) is 

r = -~= = = -8.75, 

Sj/Vn S.02/V 100 

which lies far below the lower critical value of — .995/99 — —2.64. In fact, 
the probability p of obtaining a t of —8.75 or less given a true null hypothesis 
is substantially smaller than even .001. Thus we can confidently reject the 
null hypothesis that these two dependent samples of 100 observations each 
could have been randomly sampled from two normal populations with the 
same mean. There is overwhelming evidence that a “gain” takes place on 
the personality inventory from the Freshman to the Senior years. 

The 99% confidence interval on fi t — fi z can be found from Eq. (14.6): 


d.±. 


, = -7.02 ± (2.64) — i = 

•Jn V 100 


(-9.14, -4.90). 


h. A common misapplication of inferential statistical techniques is to 
apply the techniques of Sec. 14.3 when the techniques of this section are 
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appropriate; in other words, researchers often have dependent samples, fail 
to recognize this fact, and inappropriately apply the Mest for independent 
groups to test the hypothesis that — p 1 = 0. The difference between the 
independent and dependent groups’ /-tests becomes apparent when one 
observes the standard errors of the difference between uncorrelated and 
correlated means. If X A and X 2 are uncorrelated (estimated from indepen- 
dent groups), the standard error of the difference between the two means is 


which equals 



(14.7) 


when When AT, and X t have a nonzero correlation coefficient of 

P,, (as they will have with two dependent samples), the standard error of 
'*.! “ <t.» IS 


2 '» 7 “ M; ~ —)■ < 14 - 8 > 

X -!Pp ( test ’ n 8 — Pt = 0 with independent groups contains 

Th‘e fLT'T? 30 CStimatC ° f ** (14 7 > in the ^nominator. 

and ariteiT n!f Up5 Contains = ** “ *-* in the numerator 
and an est.mate ofEq. (14.8) m the denominator; note that 


*4_ _ 

s lli yjn 




groups for which jf "and subste 8 r° if** * ^ lspcrforrned with dependent 
error of P. — p will hr or n n ,a ly positively correlated, the standard 
between the two’ means wfll bMKUrtri ' » 3tC<1 . and si S nifican t differences 
error, mistaking nonsLfficam Il 'i ™e opposite 

frequently be made if the inderv- A CnCCS f ° r s, S nificant on es, would 
groups in which .V, and X h&w* Er °. ups Mesl ,s applied to dependent 

*ee that the ability* to tcocnize l ‘ U *“ nlia ' " c ^ve correlation. Thus we 
samples is an important activit^m ‘ St "J E . Ulsh mdc pcndcnt and dependent 
techniques. Knowine which " PP * ln S inferential statistical 
phenomena being studied plus •» a Ppty. re£ l u 'res a familiarity with the 
attendant upon drawing dependent samples!^ ^ ^ Stat!StiCa ' problc ™ 



14.5 

INFERENCES ABOUT THE 
VARIANCE, of. OF A 
POPULATION 

Beginning with this section, we shall deal with testing hypotheses about 
population variances. 

a. The hypothesis to be tested is that a population has a variance of 
equal to some number a versus the hypothesis that of is different from a: 

H 0 : 4 = a 

H x : 

b. It must be assumed that the variable X has a normal distribution in 
the population and that a random sample of n observations has been selected 
from which of will be estimated. 

e. The test statistic for testing // 0 against H x is 

a 

where 

X.Y „ 

(n - " $<*i 

d. When H 0 is true, the sampling distribution of x 2 in Eq. (14.9) is the 
chi-square distribution with n — 1 df t i.e., y*_ x ; when H x is true and of is 
actually equal to some number b different from zero, the sampling distri- 
bution of (n — 1 )sf/o will equal h/o times xt- v For example, the graphs of 
(n — I)s a /IO are drawn in Fig. 14.2 for the case in which of = 10, i.e., H 0 
is true, and of = 20, i.e., H l is true — n ~ 9. 

If, from a sample of size 9, a if of 21.40 was obtained, the value of the 
test statistic in Eq. (14.9) would be 

s (n-l)s= 8(21.40) 

y = — = 17.12. 

* 10 10 

In Fig. 14.2 we see that a value of the test statistic as large as 17.12 or 
larger is relatively improbable when o’* — 10 but is quite reasonable when 
a z _ jo is true. On the basis of the evidence then, we are inclined to reject 
H 0 and support ff x . 


(14.9) 

- Xf. 


301 
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e. The critical values for testing H 0 against //, at the a-Ievel of signifi- 
cance are the a/2 and 1 - (a/2) percentile points in the chi-square distri- 
bution with df =*n — 1, i.e., y? : ^ 


co« d C0 “ Me0 “ iwe " al “ «■' value of a; is 




in grade placemen, unit. ou « maLdi^ left™ “‘“"t 
given at the end of the third grade UnUWe e ‘ ■ " IC achievement 

the introduction of a nro*™™™.,,* . ’ . , P revious years, this year saw 

pde. One of the highlftoutcd^eaSreTo/n' arithmet ^. stud y in the third 
it docs a better job of accommnri**- • programmed instruction is that 
than do traditional methods Hencf lnd ‘ v ‘ dual differences in learning rate 
students should show a Cerent v?’ ' ° f ? b * car « third grade 

test than in past years. The curriculi/** 1 ** ° n - thc arithmctic achievement 
of 25 pupils to ihom ^ a random sample 

The data will be used to test the foil* C - ar,thm etic achievement test, 
significance: tCSt the fol,ow,n 8 hypotheses at the .10 level of 

H » : = (.80)* = .64 

H l : 64. 

The variance of the sample o f 2 5 test 


scores was found to equal 1 . 14 . 
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The value 


- 24 ^- 14 > 
.64 


is compared with the critical values 


.os Ztt — 1 3.85 » .95%h — 36.42, 

and is seen to be significant at the .10 level. The probability p of a value of 
y* equal to 42.75 or more given that //„ is true is approximately .01. 

7716 90% confidence interval on of is found by substituting the sample 
data into Eq. (14.10); 


24(1.14) 

36.42 


< a\ < 


24(1,14) 

13.85 


— 0.75 < of < 1.98. 


The conclusion seems obligatory that the population of third graders 
taught by programmed instruction has a greater variance on the arithmetic 
achievement test than in past years. Notice that inferential statistical methods 
have made it unnecessary to administer the test to any more than 25 pupils. 


h. Unlike hypothesis tests about means using Student’s /-distribution, 
the assumption of sampling from a normal population cannot be taken 
lightly when testing hypotheses about population variances (see Scheffe, 
1959, chap. 10). If the population is nonnormal— particularly if it departs 
substantially from mesokurtosis— the hypothesis test outlined above may be 
quite in error. 


14.6 

INFERENCES ABOUT of/of 
USING INDEPENDENT 
SAMPLES 


a. The problem at hand possesses greater practical significance than 
that of testing whether a population has a variance equal to some hypothesized 
value o. For now we are concerned with two populations (1 and 2), and we 
wish to test whether their variances of and of are the same or different; 

H„: of = of 

//,; of ^of. 

b. It is assumed that a sample of size n 4 is drawn at random from a 
normal population with mean ft t and variance of; an independent random 
sample of size n 2 is drawn from a second normal population with mean fi x 



SELECTED INFERENTIAL TECHNIQUES 


CHAP. 14 


and variance a\. The values of /i t and are immaterial and of no interest 
in testine H t . 


c. The test statistic for testing H e against //, is the ratio of the two sample 
variances: 

5* 

I 7 =~. (14.11) 

s; 

d. When //,: oj = o* is true, Ihe sampling distribution of / = s]js\ is 

the /-distribution with n, - 1 and n, - 1 df. "when o', the dis- 

tribution of s)ls\ is equal to oy<r' limes the /--distribution with n, - 1 and 

Thus ir in reality oj/o | _ 2, the distribution of s’Jai will look 
like the /-distribution transformed by a multiplicative factor of 2. 

e. The critical values against which / in Eq. (14.11) is compared in 
testing //, against H t at the a-Ievel of significance are 

and l— <t/i)^" • j-i.Sj— it 

W,] PCr “ nli,e ' m «“ /-distribution 

„ J ~ X /f- Jt>= upper percentile points in the /-distributions 

points ate related ns foZ, l tZ^^milcs': ^ 


.'.f - 


1 




(14.12) 


consuucted a^follows:^* ""'™' °" »*> of o- to „■ is 

l‘|f »,_1 » | — r 5? 

j| (14.13) 

R. In a study by Sears (19401 

tasks, and upon completion a ranrfnmi Cn . Wcrc E Ivcn familiar arithmetic 
failed and the remaining half veTe lolH ih ^ Ucre told thc >‘ 113(1 

asked to estimate the number of sJondlh chi,d was ,hen 

next task. Observations were made hv th ° U d h ' m t£> com P ,etc ‘he 
between a child's goal ( in seconds) on the ” I ? r,m ? mcr of »hc difference 
performance on the task just completed Vt° ** perr ° rmed and h ‘ s 
encedhulure on the preceding taskS J “ * U ,hal tho * «peri- 

The hypothesis to be ^s.ed at thl ‘ r ” Ub5hhin S their goals. 

,n the population or i ^ significance i, that the 

the same whether they have « 

they failed or successfully completed 
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a preliminary task. Sears obtained the following data: 


Group I 
(told they were 
successful) 


Group 2 
(i told they were 
unsuccessful) 


-- 12 
= 8.16 


n t = 12 
s* = 90.45 


The value of the test statistic in Eq. (14.11) is 
= JU6 = 0090 
90.45 

This value is compared with the critical values that are read approximately 
from Table E: 


= 3.47 and .ozsfu.n — ' 


rsfn.i 


= — = 0.29. 
3.47 


Since F = 0.09 fails below the lower critical value, of = a\ is 
rejected at the .05 level of significance. 

The 95 % confidence interval on a\fa\ is constructed from Eq. (14.13). 




0.03 <^<0.31. 


h The choice of which sample to designate 1 and which to designate 2 
■ orhitrarv but it must not be made after observing which sample 
IS USU \ , r „ pr ^xo observe the larger sample variance and then designate 
u ™wou d effectively double the probability of a type I error over what that 
‘lability was believed to be. Detemunmg wh, eh sample variance to 
probability w bc don<; by thc ffip of a com . If one s purpose 

fSv toconf tract an interval estimate or the ratio of the population 
-1™. the above concern over designating and s* is immaterial, 
variances, t derived from independent samples 

r The “T popuTadons cannot be taken lightly-un.ike the normality 
from norma P P r mcans if substantial nonnormality is 

assumption Scc . 15 . 14 should be used, 

expected the sg £Ustomary test o\ = o\ prior to performing a 

In the past „ . _ „ = o. The former hypothesis is a state- 

M “‘ ' th ° homogeneous variances assumption made in testing the latter 
ment of he horn ^ ^ ,„ tbooks of thc „ m e not to proceed 

hypothesis. of SeC 14 3 if s y s t i e d to rejection of H 0 :o\— o t . 

AlfhougVsuch advice stemmed from an admirable concern for meeting the 
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and variance a*. The values of /<, and fi t are immaterial and of no interest 
in testing H a . 

c. The test statistic for testing H 0 against H l is the ratio of the two sample 
variances: 


F = %. 


(14.11) 


d. When //„: a\ = o~ is true, the sampling distribution of F = sVs* is 
the F-d.stnbution uith n, - 1 and n t - 1 df. When a \, the dis- 

tribution of s\!s\ is equal to o])o\ times the /"-distribution with n, - 1 and 
f* df : ,f in rca % - 2, the distribution of s'M will look 

like the F-distributton transformed by a multiplicatise factor of 2. 

y h ' cr i ti “ ,valu " which fin Eq. (14.11) i, compared in 

teihnj It, against H, al the n-level of significance are 


and 




wiiLn. - Undn 3 - Wf 1 '^. <J,2) ’ !*>!"“ I» .he distribution 

can be read directly from T v, U ^i >e . r * 5erCcntile P oinlsinlfie/: '- d ‘ stri butions 

point, ■»- P—«. 






(14.12) 


constructed as follows:^ COnfidcnce interval on the ratio of a{ to al is 

» <■ r s? 

*1 al ( 14 . 13 ) 

R. In a study by Sears (IO401 rhtM 

tasks, and upon completion a n'nHn ren v,cre £'' en familiar arithmetic 
failed and the temaininc half C h °“ n halr * crc fold they had 

aslcd 10 cslimaie ihe number or seconded Ea<:h child was lhen 

Z" .ask. Observations w„e nS" bv ,h him >° «™P'« -he 

between a child's foa | (in M t "penmenter or the difference 

performance on the task just^com'cdeied lh ' “* >° Performed and his 

"If" 1 ”" ™ preceding task michtj, ZZ ,ho “ * l, ° “P™- 

The hypothesis lo be tisted a , ,1. he c " 2,lc m establishing their coals. 
■» ^e population of chi.d, ^ - '' ° f 5i E" in °"“ is .hi. the 

the same whether they hate Lent j " ™'? i*' f °"" a ”“ * 

told they faded or suceessfully completed 
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a preliminary task. Sears obtained the following data: 

Group 1 Group 2 

{told they were {told they were 
successful) unsuccessful) 


n, = 12 
s' = 8.16 


n t = 12 
s' = 90.45 


The value of the test statistic in Eq. (14.1 1) is 

f = = 0.090. 

90.45 

This value is compared with the critical values that are read approximately 
from Table E: ^ ^ 

.I7sfn.it - 3 - 47 and ^ F “ 347 ~~ ° 29 ' 

Since F - 0.09 falls below the lower critical value. H,: o’ = o’ is 
r ' j % C he a 9 S% e conMencfintcrvai on o\M is constructed from Eq. (14.13): 


(0.29) 


8-il < i < (3 47) — - 0.03 < ^ < 0.31. 
: < 1 *90.45 “* 


90.45 o\ 




h The choice of which sample to designate 1 and which to designate 2 
. , |v arb i, ra ry but it must not be mode often observing which sample 

is usually y h , r r samp le variance and then designate 

cariance v t la^e ; Too *en e Ability of a type I error over what that 

iTrntv was reived to be. Determining which sample variance to 
probability _w* I ^ ^ by (h(! flip of . coin . Ir one's pu , p „ S e 

Sv to'clftruct an interval estimate of the ratio of the population 
,S dances the above concern over designating s\ and s\ is immaterial, 
variances, t derived from independent samples 

Th = Tabulations innot S>e taken lightly-unlike the normality 
from normal P»P 1 MK1S 0 f means. If substantial nonnormality is 

assumption u ” der ‘T t in ^ 15i i 4 shou ld be used, 
expected the sigmfi customa ry to test H„: nj = <rj prior to performing a 
In the past ■t»““ sto "’ J = „ ^‘former hypothesis is a state- 
/-test of the yP® “ variances assumption made in testing the latter 

men, of the TTsadilld in thc textbooks of the time no, to proceed 
hypothesis. of Sec . 14.3 if sJ/j? led to rejecUon of H,: - o.. 

Al'thoug'hlch advice stemmed from an admirable concern for meeting the 
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assumptions of the tests employed, it proved to be generally poor advice. 
In particular, the preliminary test of the assumption of homogeneous 
variances can be largely invalidated by nonnormality of the populations; 
but this same nonnormality is of no consequence to the validity of the /-test 
of /i, — fi. — O. In fact, if n t — n z there is no reason to be concerned about 
violation of the assumption of homogeneous variances. The only circum- 
stance in which one might advisedly test //„: a\ = a\ prior to testing H 0 : 
Hi — f*t — 0 is when there is good evidence that the populations are normally 
distributed and n, and n 2 arc quite unequal. It may be possible to find a 
simple transformation of the observations that will give them a more nearly 
normal distribution. The subject of “normalizing transformation” is dealt 
with in detail in P. O. Johnson’s Statistical Methods in Research (1949, 
chap. 7). 


14.7 

INFERENCES ABOUT o*/o* 
USING DEPENDENT SAMPLES 


a. As in Sec. 14.6, the null hypothesis being tested here is that two 
populations have the same variance: 

//,: = o* 

* oi 

b. It is assumed that two possibly dependent samples are drawn, one of 
sue n from a norma! population with variance a\ and the other of the same 

pop “' a ' ion ' i,h varis "“ ”'*• ^ va ’“" of * “” d 

“ mp '” w di5C "” cd in ,4 - 4 “ •««"* 

c. The test statistic used in testing //, against //, is 
s?-s* 


/ 4s* s\ 


0 - r?*) 


(14.14) 


‘ tot 1 a ” d 2 ' repwivdy, 

r «T. , £,c ° b ' CT ™’°n in lamplc 2, and 

.. e correlation coefficient calculated on the n paired observations. 

;. ht ft™ 1 * •* - "pair O <T 

brother •sutef.'* “ mp,eS in, “ " P*'* CWorc-af ter." 
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d. When V/ 0 : <r* — of is true the sampling distribution of t in Eq. 
(14.14) is Student’s /-distribution with n — 2 df. 

e. The critical values for testing ff 0 against H x at the a-Ievel of signifi- 
cance are: 

f. Constructing a confidence interval on a\ — <j| or o\}o\ when estimates 
of the variances are obtained from dependent samples presents difficulties 
beyond the scope of this textbook. 

g. Lord (19636) reported data gathered by William E. Coffman on the 
performance of a sample of 95 students on the Stanford Achievement Test 
in the seventh and eighth grades. One might wonder whether, in the 
population of students sampled, performance is more uniform (less variable) 
in the seventh or the eighth grade. 

Sample 1 is the set of scores on the Stanford Achievement Test earned 
by the 95 students in grade seven, and sample 2 is the set of scores earned 
by the same 95 students in grade eight. Thus the two samples are not 
independent. The following data were obtained to test H 0 : of = a\ at the 
.10 level of significance: 

Sample 1 Sample 2 

( grade 7) (grade 8) 


* = 95 n = 95 

sj = 1 34.56 s\ — 201.64 

r l2 - .876 

The value of r 12 = .876 is the product-moment correlation coefficient 
between the 95 students’ performance in the seventh and eighth grades. 
From Eq. (14.14), 

t = 134.56 - 201.64 _ = _ 4 0? 

^/ 4»34.56X20J.64) (] _ ^ 

The critical values with which the obtained t of —4.07 is compared are 
— .ssf»s = —1.66 and . 9S r 83 = 1.66. 

Thus we see that evidence exists to conclude that in the populations 
sampled — seventh-grade students and eighth-grade students— the variances o\ 
and o* are different. The probability of obtaining a value of t as discrepant 
from zero as that obtained is less than .001 if o\ is truly equal to a\. Per- 



303 SELECTED INFERENTIAL TECHNIQUES 


CHAP. 14 


formance on the Stanford Achievement Test is more variable among eighth- 
grade than among seventh-grade students. 


14.8 

INFERENCES ABOUT p, THE 
POPULATION PRODUCT- 
MOMENT CORRELATION 
COEFFICIENT 


a. The hypothesis being tested is that the product-moment correlation 
coefficient between variables X and Y, i.e. f p^, is equal to some value 
a between — 1 and +1, of course — in the population sampled: 

H i-P n ¥=a. 


(14.15) 


b. It IS assumed that a random sample of n paired observations (X„ Y ,) 
is drawn from a bicariate normal population (see Sec. 6.7) in which the 
correlation of AT and Y is p„. The means and variances for the bivariate 
normal population are not of interest. 

a6lini ' ">• calculates r„, the sample product- 

means of ■ The va, “ 5 ot '» is ““ 'raULoi by 

? <sce Tabl ' 0 in APP 2 "** A > in'“ 

- ,est sutls " c f °' ''Sims 'he null hypothesis that P „ = a is 

. Z,-Z. 

I hmfz'woTmed" 1 ';' “ rr ^" di "E •» <*' 'ample value ofr„, 
esitedl^ i°„™: d a V „r corres P on ding ,0 ». the value hypoth- 
n is the sample size. 

distributed with’mean 0 and standard a? V*? = -' n ^ (I4 ' 15) is " ormall y 
distribution. If, on the other hand 2 ha s the unit normal 

from u, then z in £ m 4 M b ’ ' “ '™ a ” d = b «“ « <»rent 
desiation , a „d centeHng alnd ™ h “ 

“//*. P n — Ois tnii* A- For example, 

PI" of"ite ,2 T;XZ b T,r - ,Z - - WR rot tarn- 

1Ur,: f" ° .<*> « truTand n t \2 °H- thc tcft in Fi S* l4X 

is the curse on the «’ ~ (Z ’ ~ 
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FIG. 14.3 Sampling distributions of z — {Z, — 0 )/(l/\^/i — 3) for 
samples of size 12 when p„ — 0 and when p„ = .60. 


e. The critical values against which z in Eq. (14.15) is compared in 
testing H 0 against//, at the a -level of significance are the 100(a/2)and 100(1 — 
(a/2)] percentiles in the unit normal distribution: 

a/t z and i_(,/ 2 )Z. 

For example, if H 0 is tested at the .05 level, the critical values are —1.96 
and +1.96. 

f. Confidence intervals are constructed on p tv by first constructing a 
confidence interval on Z p around Z r and transforming the upper and lower 
limits on this interval back to the scale of r by reading Table G in reverse. 

The first step in building the 100(1 — a)% confidence interval on p ^ is 
to calculate 

z,± "7==> (M.16) 

\n — 3 

where Z r and n are as defined in Eq. (14.15) above, and 

is the I OOf 1 — (a/2)] percentile in the unit normal distribution. 

Equation (14.16) will determine two points on the Z-transformation 
scale. Table G is entered with these two values and the two corresponding 
values of r xv are determined. These two values of r„ constitute the 100(1 — 
a)% confidence interval on p zy . 

g. Forehand and Libby (1962) gathered data on the correlation between 
“supervisors’ ratings of innovative behavior,” X, and scores on tests of 
divergent thinking (“flexibility” and “implications” — known in the vernacular 
as tests of “creativity”), F, on n = 60 governmental administrators. An 
r n of .30 was obtained. 

We shall test the null hypothesis with a = .05 that p Iy is zero: 

Pxy — 0 
#1 • Px V =£Q. 
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From Tabic G, the Z-transformed value of an r of ,30 is .310, i.e., 
Z.*o = .310. Of course, Z 0 = 0. Thus, the value of z in Eq. (14.15) is 

z «= - ' 3I 2 jlL = ,310(,/57) = 2.34. 
l/s/60 - 3 

The obtained value of z is compared with 


Since 2.34 exceeds 1.96, H„ can be rejected and H x accepted at the .05 level 
of significance. The 95% confidence interval on Prt is constructed as follows: 

Z “ ± (1.96) - ^L— = .310 ± .259 = (.051, .569). 

569 ?! ‘ he Va ,'““ ° rr " corrK PO'>'lmS 10 Z-values of .051 and 

Thus theM-/ " *? a PP roxima,t, y -OS and -51, respectively, 
inus the 95 % confidence interval on Pty is (.05, .51). ^ 3 

,si«d in" ° r i " S ‘ a "“ S i !LE2« i « «■' value of,„ hypoth- 

csized in H a is 0. Hence, (Z. — 0)1(1/ Jn — 31 - 7/ fl • r , 

hypothesis test outlined ahnv* it ** ” ncn Prv is 0. (The 

dose to theexacltest for even small is ifirA,' ' 7*" ^ 1Jl °"' which is 50 

should cause no concernTwh^npl 1 ^!^ *“* ,hat 11 ** an a PP roxi ^t«on 


has the Student f-distribution with n~2df ThW 
a table of critical vatues of r to facilitat'Vf 5** f Was UScd to instruct 
Appendix A li,„ , hc ,al„es"v,hieh l ?"” S Table I in 

constitute evidence Tor rejection of // •" m “ !1 „ cx “ cd absolute value to 
4 “topic r must lie above cpproiZZeh 250°' '^ ple ' ' vith ” = “■ 
• Pn - 0 can be rejected allbc 05lcvelniv" -r ° T ^ ow ~' 2 ?0 before 
for particular salues of a and dr=J-i ° r ! 8 n,r '“nce. (Table I is entered 
“ ccjcction, as was seen abose. As a limbi, a Mn, P' e ’ - ° r TO leads 

the critical values of,„ f or h " " am Pl= of bow lo read Table 1, 

» sample of sire |2 are .708 A ' ,' CVcl ° rsi £"'»cance with 

* N °"“ “1 Hjcta iZJZ Ca^VSr” ° f wmi “ 
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INFERENCES ABOUT Pl - p 2 
USING INDEPENDENT 
SAMPLES 


a. In this instance, inferences concern the possible difference between 
the correlation p t of X and Tin population 1 and the correlation p 2 of the 
same two variables in population 2. For example, are aptitude X and achieve- 
ment Y more highly correlated for boys (population 1) than for girls (popu- 
lation 2)? The null hypothesis is generally that p x = p 2 ; the alternative 
hypothesis is that they are unequal: 

Pi — Pz 
Hx- Pi ¥=■ Pz- 


b. It is assumed that a random sample of size n x is drawn from a bivariate - 
normal population 1 in which the product-moment correlation coefficient is 
pi and that an independent random sample of size « s is drawn from a bivariate- 
normal population 2 with correlation p 2 . 

c. Again the inferential statistical problems are handled by means of 
Fisher’s Z-transformation of r. Samples 1 and 2 are drawn from populations 
1 and 2, respectively. The two sample correlation coefficients, zj and r 2 , 
are calculated and then transformed to Z fj and Z fj by means of Table G. 
The test statistic for testing H 0 : p, = p 2 is: 

Z r , - Z ri 

2 = . - ± (14.17) 

V Wj — 3 n t — 3 

d. If in fact p 2 equals p 2 , then z in Eq. (14.17) has a normal distribution 
with mean 0 and standard deviation 1 over repeated pairs of independent 
random samples. If p 2 and p z actually differ, then the mean of the sampling 
distribution of z in Eq. (14.17) will shift away from 0 — it will become 
Z pi ~Z Pi — but the standard deviation will remain equal to 1. 

e. To test H 0 : p 2 = p 2 against Z/i ' pi £ p 2 at the a-fevel of significance, 
the single calculated value of z in Eq. (14.17) is compared with the 100(a/2) 
and 100(1 — (a/2)] percentiles in the unit normal distribution: 

„ /2 z and 1 _(,/ 2 )Z. 

f. Confidence intervals on p x — p 2 are constructed by way of confidence 
intervals on the Z-transformation of p 2 and p 2 . To find the 100(1 — a)% 
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confidence interval on Pl — Pl , one first calculates 

(Z„ - Z„) ± I -1— + — ^ . (14.18) 

v n t — 3 n, — 3 

The two values on the Z scale obtained from Eq. (14.18) are then 
transformed to the r scale by means of Table G. The resulting two values 
of r form the 100(1 — et)% confidence interval on Pl — Pi . The problems 
of attaching meaning to a value for Pl - Pt will be discussed under (h) 


g. The tecJuuques of this section will be illustrated on data based loosely 
y r°" !‘ 939)a " d Di! P ensa * ‘ample of200 children 

BinSl r" Tt, 7 S ' X v ! 5, the “"''a*' 0 " bewcen intelligence (Stanford- 
LT "’f»ohcra,e y war r, = .71. In a sample of 78 adnlts 
ranjmg age rom 8 to 25, intelligence and basal metabolic rate correlated 

Hc "“- 


.887 z „ = .288. 
•887 - .288 


= — 1 a _ _ 

l~ 7 r 

’ n ‘ ~ 3 "> - 3 V 200 - 3 + 78 - 


.599 . 

-~-4.40. 


The 99 •/, confidence iniervaWjr^p - p J j71 '°i level of significance. 

p ‘ p ‘ ,s fo “ nti r 'om Ecj. (14.18) as follows: 

•887 - .288 ± 2.58 f 1 i „ 

V200-3 + 78_ 3 = ' 599 — - 350 — (.249, .949). 

confidence inttrvd on p, t ”'^ : f’S|““4 baClt *° lhe r_!calc yields the 99% 

sense, merely esSng”he dX'ena ta' h “ * nBle ' Perfectly good 
£ - C and setting a confident im ^, 77" ,7 *« “-efficient, 'with 
lack meaning. Suppose, for example lh ' < ” ,nd,h, ” am P'' difference may 

J. ™g conditions on p, and p, could hoH In ' 2 °' A "y on eofthe 

= - 20 and P, •= .00 (3) p |„. d: (1) f‘ = -90 and p, = .70; 
each case the difference between p and ! 7* ™ -■»>■ Even though in 
mo ™ ‘° ,hC ,hra = conditions fs midp - 20 ’ «* "leaning that 

77 1™* differ^ nlh f r '"'- « 'he .20 difference 
Se! ' m' r j han,ha 20 differed 1, 2, “ P redict one variable 

---^•aSsS^SR: 
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is of little interest, although testing hypotheses about that difference or 
merely noting the values of r t and r t are both meaningful approaches. 

Hypothesis testing and inferential techniques exist for making inferences 
about a set of/ correlation coefficients for variables X and Y when / inde- 
pendent estimates, r xut are available. For example, one might test the 
hypothesis that intelligence X and age at which a child begins to speak Y 
are equally highly correlated for the three populations of first-, second-, and 
third-born children. For a presentation of appropriate inferential techniques 
see Marascuilo (1966). 


14.10 

INFERENCES ABOUT - p„ 
USING DEPENDENT 
SAMPLES 


a. The null hypothesis to be tested is that a variable X has the same 
correlation with two other variables, Y and Z, against the alternative 
hypothesis that p xv and p x , are not equal: 

Hfi' Pry = Par 
Hi’- Pxv ^ Ptt- 

This situation would arise if one wished to predict “academic success” 
(as measured by a grade-point average) and had two potential predictors, 
Y and Z. If for financial reasons only one predictor (Y or Z) could be used, 
it would be wise to gather data, estimate r zy and r«, and then test the hy- 
pothesis that the observed difference between the two sample r’s represented 
a true difference between p n and p xz . 

b. It is assumed that there exist three bivariate-normal populations, one 
for each of the pairs of variables X and Y, X and Z, and Y and Z. A single 
random sample of n persons is drawn from which the three correlation 
coefficients r Ty , r xxt and r yt are calculated. Obviously, these three estimates 
of correlation are not independent. 


c. The test statistic for testing H 0 against is: 

~ V «( r »» — r xt ) 

V(I — rj.)’ + (1 - rl,)‘ - 2rl, 

- (2'.. - - 'l, ~ r\, - rjj 

where n is the sample size, 

r xv is the sample correlation of X and Y, 
r„ is the sample correlation of X and Z, and 
r v , is the sample correlation of Y and Z. 


(14.19) 
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d. If is true, i.e., p„~ p t „ then z in Eq. 
distribution over repeated samples of size n that is 
the normal distribution with mean 0 and standan 
and SioVani, 1964, and Olkin, 1967). When H, is true, the mean ol the 
sampling distribution of r in Eq. (14.19) shifts away from zero but the 
standard deviation remains approximately equal to 1. 

e. As might be expected, the null hypothesis ff 0 : p zy «* p xt is tested at 
the tx-level of significance by comparing the observed value of r in Eq. (14.19) 
with the 100(*/2) and 10Q[t — (*/2)l percentile points in the unit normal 
distribution; the two critical values for the hypothesis test ate 

mli z and i _<»/«*. 

f. Interval estimation of the difference p xt ~ p„ is difficult to justify 
for the same reasons discussed at the end of Sec. 14.9. However, if such 
estimation can ever be justified, appropriate techniques may be found in 
Olkin (1967, p. 113). 

g. Suppose that success in college X (as measured by grade-point average 
at the end of the first year) can be predicted either from test battery Y or test 
battery Z. Only one test battery can be employed because of the short 
available testing time. It is desired to test the hypothesis H 0 : p„ = p xt at 
the .05 level of significance. 

For a random sample of 100 freshmen who were given both test batteries 
as high-school seniors, the three possible correlation coefficients among X, 
Y, and Z are: 

r *v - .56, = .43, r vl = .52. 

The value of z in Eq. (14.19) is found to be 


(14.19) has a sampling 
closely approximated by 
1 deviation 1 (see Olkin 



VO - -56*)* + (I - .43*1* “ 


Vi 00(.S6 - .43) 


10U3) _ 1.300 
N / .6597 .8122 = 


- (2(.52) - (.56)(.43)](1 - .56* - .43* - .52*) 
1.60. 


°I Z fa '! S far be,ow the re q u * r cd critical value of 1.96. 
in Djedictinp Y rJ* ^batteries Fand Z are alike with respect to accuracy 
conclusion rf n "° ** re - J ' cted ' Un,css a large sample would alter this 
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INFERENCES ABOUT OTHER 
CORRELATION COEFFICIENTS 


In this section, significance testing procedures for the correlation coefficients 
presented in Chapter 9 will be discussed. The presentation of inferential 
techniques will be greatly simplified since space does not permit detailed 
treatments such as those in the previous sections of this chapter. Procedures 
will be presented with which the null hypothesis of no correlation between 
variables X and Y can be tested against a nondirectional alternative hy- 
pothesis for most of the coefficients in Chapter 9. Where possible, a test 
statistic will be defined that has a known or approximately known sampling 
distribution given that the two variables being correlated — whether they are 
measured dichotomously , with ranks , or otherwise — are unrelated, i.e., have a 
coefficient of zero in the population. In each instance, the significance testing 
procedure will be illustrated with hypothetical data. 

The Phi Coefficient, <J> 

Suppose that a random sample of n persons is drawn from a population in 
which two dichotomously scored variables, X and Y, have a population phi 
coefficient of zero. For large values of n (20 or greater, say) the sampling 
distribution of Vn<f>, where is the sample phi coefficient, is approximately 
normal with mean 0 and standard deviation 1. (When the population phi 
coefficient is different from zero, the sampling distribution o f\/n<f> becomes 
skewed and centers around a mean different from zero by an amount that 
increases as the population value of <f> deviates further from 0.) 

A random sample of n ~ 25 persons is observed on two dichotomous 
variables: X is the variable “sex” scored 0 — female and 1 — male; T is the 
variable “attrition” scored 0 — “dropped out of school” and 1 — “remained in 
school." The sample value of $ is equal to —.41. The test statistic is louncl 
as follows: 

z = -f„ i = V25 (-.41) = 5C-.41) = -2.05. 

This value of z lies just below the 2.5th percentile in the unit normal 
distribution ( 025 z = —1.96). Hence, the hypothesis that sex and attrition 
from high school are unrelated in the population sampled can be rejected at 
the .05 level of significance. (It is interesting to note that a sample <f> lying 
as far from zero as —.30 would not have led to rejection of the null hypothesis 
at the .05 level for a sample of 25 persons: yfn <f> = — 1.50 > —1.96.) 


31S 
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Spearman’s Rank-Correlation 
Coefficient, r t 


The sampling distribution of r, given zero correlation between two sets of 
ranks X and Y in the population cannot be characterized in terms of any 
well-known statistical distributions for n less than approximately 10. For 
n > 10, say, the sampling distribution of r, when the population Spearman 
rank-correlation coefficient is zero is related approximately to Student’s 
(•distribution. In fact, for n > 10 


V(1 - /?)/<n - 2) 


(14.20) 


is approximately distributed as Student’s /-distribution with n — 2 df when 
the population value of r, is zero. 

Suppose for example, .ha. in a sample of o = 22, the value of r. is 
.38. The value of t is * 


/ 1 - .38* 
\ 22-2 


.38 

’ .207 = 


ieve,:r;p^ 

a# , 99 c 5,h in "^Sributa S 

rejected at the .01 level. ° ° rrC 31100 10 the P°P ulation cannot be 

to .m F E °;i::i!^ h '"ro t fr£Lt? ri ^ tio " ° rr - has »-» »» 

(see Kendall 1962) e..Ad r ,, P ” lh; lw ° ra " fc ed variables 
for various rtues of „ We ^ Sampli "8 ^s..ibu.ioos of r. 

Appendix A. As an e", ra £ XwToTk " d *" Tab,a K ° f 

■han .794 or less ,ba„ _. 79 4 is JX J? ■ 1 rtad ’ a value ° f '■ f^ater 

" - H, ,S re< l u "' d r °' s.gmficance at .he .01 level when 


Kendall’s t 


'“' d 'A termTofm” oT.hf^™'” 0 J Kendall ' s ' is most convenientl 
^ «• fm a temple of si„^^~- As was 4ee„ i 

T = __ 

n(n-i)/ 2 ’ 

ere S = p ^ q . s ,j le difference between the .ota, numb., of '‘agreement 
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P and the total number of “inversions" Q in the two arrays of ranks (see 
Sec. 9.3 for definitions of “agreements” and “inversions”). 

The sampling distribution of S is more conveniently studied than that 
of r. When n is greater than or equal to 10, the sampling distribution of S 
is nearly normal when X and V are uncorrelated in the population; the 
standard deviation of S is approximately 

-Jn(n - 1)(2» + 5)/18. 


The sampling distribution of S can be made more nearly normal by a simple 
“continuity correction” by which S is transformed into a quantity that 
will be denoted by S* : 

If S is negative, S* = S + I. 

If S is positive, S* = S — 1. 

(In some textbooks, one is directed to calculate S* by [S’] — 1. This pro- 
cedure is incorrect and approximately doubles the probability of a type I 
error.) 

It follows that 


<• (lt.411 

Vn(n - l)(2n 4- 5)/18 

closely follows the normal distribution with mean 0 and standard deviation 
1 when n > 10 and the null hypothesis of no relationship between the two 
variables is true. 

Suppose that n = 10 and 5 = 9 so that r = 9/[10(9)/2] = .20. Since S 
is positive, S* = S— 1=9-1= 8. The value of r is 


710(10 - 1)1(2 ■ 10) + 5]/18 7l 25 

The probability is greater than .40 that a z score will lie either above 
+0.72 or below —0.72. Hence we see that a r = .20 for a sample of size 
10 gives no evidence that would lead one to conclude the value of r is different 
from zero in the population sampled. 

For values of « from 4 to 10, Kendall (1962) tabulated certain char- 
acteristics of the sampling distribution of S. Table L of Appendix A lists 
the probabilities of obtaining values of S equal to or greater than certain 
tabulated values when n = 4, .... 10 and the population value of r is zero. 
For example, when n = 8 and the population value of r is zero, we see 
from Table L that the probability of obtaining a value of S equal to J4 or 
greater is .054. S attains a value of — 14 or less with probability .054. Thus 
a value of S = 16 with an n of 8 is significantly different from zero with an 
a of .10 for a two-tailed test. 
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The Point-Blierial Correlation 
Coefficient, r pi 


By inspection of the formula for r fb it can be seen that r pb is equal to zero 
if and only if X ml — X. 0 : 


r I'-*' I "i", 

s x V 


Testing the null hypothesis that a population value of the point-biserial 
correlation coefficient is zero is closely equivalent to testing the hypothesis that 
in the population those persons scoring 1 on the dichotomous variable have 
a mean equal to the population mean of the persons scoring 0 on the dichotomy, 
t.e., //*: ft t = /i* If is zero in the population sampled, then 


^0 - r U)Kn - 2) 
is approximately distributed as Student's /-distribution with ,, - 2 df. 

Suppose that in a sample of size n = 18 the -value of r „ is 56 
value of t is found as follows: pt 


(14.22) 




^ S ' h :S h , P 'r t " t in Slud '"'' s 

coefficient is zero in Ihe population sampled.* p0lnt - b,s '" al correlation 


The Tetrachoric Correlation 
Coefficient, r„, 


button of the sample tetrachoric correlat' miCS in trUC ’ tflc sam P iin g distri- 
norma, tor „ greater than aS"; h ZTT'T ’ “ -PProrima.el, 
out cu ssith mea n 0 and standard deviation 


-here n is the sample size, V (14 ' 23 

mlasmSTs^bfe™” S “' ins 1 on lht dichotomousl; 

measured 'y^ariabie^ 01 * 5 1 «« dichotomousl; 

w « i* the ordinate abovr fh*. ’ 

-‘^pro^i^^^-enbov 


SEC. 14.11 


INFERENCES ABOUT OTHER CORRELATION COEFFICIENTS 319 



O 

1 


1 

I 4 

(jL 

23 

0 | 

2) 

Lid 

27 

FIG. 14.4 

25 

25 I 

n = 50 


u , is the ordinate above the z-score on the unit normal curve above 
which p„ propoi tion of the area lies. 

Thus, for moderately large and large n 


can be referred to the unit normal distribution to test the hypothesis that 
the population tetrachoric correlation coefficient is zero. 

Suppose the data in Fig. 14.4 are gathered in studying the relationship 
between two variables X and Y that are believed to be normally distributed 
but can only be measured dichotomously. 

The value of ( bc)J(ad) is (21)(19)/(4)(6) = 399/24 = 16.625. In Table 
H in Appendix A 16.625 is found to correspond to an r tet of .81. 

The standard error of r tfl is estimated from the following data: 

= = .50 q x = .50 

/>, = H - -46 ?, - -54. 

From Table B, the ordinate on the unit normal curve at the point above 
which p x = .50 of the area lies is u x = .3989; the ordinate on the curve at the 
point above which p t — .46 of the area lies is u y ~ .3970. Thus the value of 
z is 

.81 


1 


/ (■50)(.46X.50X-54) 

V 50 (.3989)(.3970) 


■ =-- 3.64. 


A z score of 3.64 far exceeds even the 99.5th percentile in the unit normal 
distribution. We must conclude that the population tetrachoric correlation 
coefficient is nonzero. 


The Btserial Correlation 
Coefficient, r bl . 


The exact sampling distribution of r bi , is not known. It was never derived 
by Pearson, and the only substantial finding was that of Soper (1914), who 
derived the standard deviation of r„„ for large samples. McNamara and 
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Dunlap (1934) argued that when the population biserial correlation coeffi- 
cient is zero, then for large samples r,„ should be approximately normally 
distributed with mean 0 and standard deviation 



(14.24) 


where n is the sample size, 

n, is the number of persons scoring 1 on the dichotomous variable 
(«<* *= n — «i), and 

u is the ordinate on the unit normal curve at the point above which 
«i/« proportion of the 3rea under the curve lies. 


When the population biserial correlation coefficient departs from zero, 
the value of tr ftl> is diminished by 1 /Jn times the square of the population 
value of r Ml . When the population value of r bt , is different from zeTO, the 
sampling distribution of r k( , becomes nonnormal, being skewed toward zero. 

Empirical sampling studies by Lord (1963a) and Baker (1965) have 
shown that the above large-sample estimate of the standard error of r ti , is 
quite nearly exact; in Bakers case this was true even for samples of size 15. 

Suppose that for a sample of size 36 in which n x = 16 and n 0 = 20, 
the value of r k<< is —.145. The value of «r r#lj when the population biserial 
coefficient is zero is 

(.3951)36^36 


If the population value of r kfl is zero, r% t Jo rtlt will be approximately 
normally distributed with mean 0 and standard deviation 1 over repeated 
random samples of size n. 


r M . -.145 
-210 


= -. 69 . 


A z value or -.69 appears lo be a not unusual observation to obtain 
u _ m P lm S trom a normal distribution. Hence, evidence does not exist 
to rilow re jection of the null hypothesis of a population biserial correlation 


The Rank-Biserial Correlation 
Coefficient, r rt 
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The Partial Correlation 
Coefficient, r nt 


The partial correlation coefficient between X and Y with variable Z held 
constant" was developed in Sec. 9.4. There it was seen that the correlahon 
of the errors of estimate when both X and Y are predicted tnd.v, dually from 
Z is 


7(1 - r;,)(l - O 

The population partial correlation of * and Y with Z "partialed out” 
can be denoted by 

= Ptw - PttPyz 

“ 7d - AXi - 

When p„., equals zero, the test statistic 


= s/(l - >••„..>/(» - 


(14.25) 


has Student’s , -distribution with n - 3# Suppose that in a sample of 
„ = 12, the value of r„„ is -80, Then 


= 7(1 - ,80 ! )/(12 - 3) 


_ oo sth nrrcentile in the r-distribution with 9 degrees of freedom is 
a ,50 t S hypothesis of a zero population partial correlation coefficient 
can be rejected at the .01 level of significance. 


INFERENCES ABOUT P. THE 
POPULATION PROPORTION 

We shall now be concerned with the proportion of “units” (persons, 
„■ We sha t n ctc ) jn a population that possess some character- 

families, schools, mode| cari a ..modem math” curriculum, a children s 
istic (blue eyes, a ^ proportion of the units in the population that 

diagnostic ’ ic jn question will be denoted by P. which equals the 

possess the chara th e characteristic divided by the total number 

number of the un P example, of the 60,000 pupils ,n a particular 

«hoT^. -e of Spa„ish- P American descent; hence, P = 3000/ 
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60,000 = .05 when the characteristic being observed is "Spanish-American 
vs. non-Spanish-American descent.” 

The hypothesis to be tested is that in a relatively large population, the 
proportion P possessing a particular characteristic is equal to a value a that 
lies between 0 and 1.0, inclusive: 

//„: P = a 
: P ^ a. 


b. For purposes of testing H a against it need only be assumed that 
a random sample of size n is drawn from the population. 

c. Within the sample of n units, the number/ who possess the char- 
“ in ( l Ucstion is observed. The sample proportion p is the ratio of 


The statistic p is an estimator of r. In fact, ir we think of a dichoto- 
T ? ■ vanabk . Jr ,hat 'Tusls 1 when the unit observed possesses 

= Xj + ... + v =r. 


The population mean of X will be P ti,... 

as "Ssstr as an ,hc proF * rtics 

* Va " able has a r ™" ™ <he p“;Ta.‘r,h a f^““^ SCOred 

= E(X - pf _ p(i _ p) 

sshere P is the population proportion. This fact will be used beiow. 


Tx r iha ' p e ‘ ,,,ais “• 

d «.at,o„ «,/V n. The same is of „ P ^ “ "" a " ^ and s,a 


dev.at,o„ »Jv». The same is ,„e of ! p * h “ 3 " Mn P ■ standard 
distribution ofp has a mean of !i “ ». the samplina 

be population proportion-and a standard 
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deviation of ajyfn = y/a(l — a)jn : 

E(p) — o 


" V n 


— a) 


The question of the shape of the sampling distribution of p is answered 
bv appealing to the central limit theorem. When n n sufficiently large, the 

be considered depends on the 

example, that an anthropology . an tribe (P is as small as .02 

proportion of left-handed pers , ban , 0 in thc United States.) The 

in some African tn^, bu^P g as larg e as 500 to guard 

anthropologist would dowel! to draw a ^ ^ _ 5 

° Sail "if'tlte'value of !p is 1.=., if «. * if ™ is 6 rea,er ,ha " 5 ’ ‘ h£n 

= P ~ - (14.26) 

>/a(l - <*)/« 

is approximately normally distributed over random samples of siae « with 

“""fVrstrtal^dXmfrom will, of course, be 

1 , * 4 .. . , with m ean b and standard deviation y/b( 1 - «/», 

normally distnbu e th f is sma il er , is greater than about 5. 

provided no or v >' 

. ct rr at t h e a-level of significance, the value of z in 
EO a4 T 2°6)1s S cS.pared with the 100(«/2) and 100[1 - («/2)] percentiles in 
the unit normal distribution:^ ^ 

_ o<-/ and 99°/ confidence intervals on P around p can be found 

f. The 95/£ and 99/. p in Ap n e ndix A. Suppose, for example, 

most easily by referer i lhe valuc^f p is .00. The 95 % confidence 

that in a sample" from 5I to .71. For larg e samp to, 

r^y " u T — * 1 - - ,ha ' ' ± - p)l " 

Will include P between its limits. 

.^rintendent of a school district wishes to take a poll one 
m „„fh before’ a city election to determine the chances of the proposed school 
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bond receiving a majority of the votes that will be cast. The hypothesis 
to be tested at the .01 level of significance is that P, the proportion of the 
25,000 registered voters favoring the school bond, is .50; the alternative 
hypothesis is that P is not .50. A random sample of n = 100 registered 
voters is drawn; upon questioning,y= 42 voters indicated that they favored 
the school bond. 

The value of p is f\n «* .42. The value of z in Eq. (14.26) is 


-1.60. 


s/o(l-a)/n s /.50(.50)/100 
The z value of —1.60 is compared with the critical values „ s z = -2.58 
and 2.58. It cannot he concluded that p = .42 is significantly 

? H h T / ”5 01 leVC '- In fact - .05 level of significance 
rejcc,ine H ' ir - - 50 !in “ 

fro ° n ' “ fOU " d (tr0m Tab " P) * 


14.13 

INFERENCES ABOUT P. - p 
USING INDEPENDENT 
SAMPLES 


Of 2 - ««* *. proportions 

are Pj and P.. The hYDothesk’tn k . I P°, ssessm g a certain characteristic 
native hypothesis that P, ^ p jZ C CS C ,S that P » ~ against the alter- 

H 0 : P, = Pj 

of th3t the Portion A 

equal to the proportion P* 0 f students nof ^ - Plan l ° attend college is 
"ho plan to attend college. receiving vocational counseling 

‘ample of size ,, is drawn 
from population 2. ’’Pendent Worn sampIc of size „ s i s d rawn 

■he characteristic being^bsmed" hr ’anX f ' 0m P°P uIali °' 1 l possessing 
he sample from population , /' and lht Proportion is p, £ 

(Planning to attend college, saj^the ‘ hc ch ?rMeris.ic in question 
’ ,he Proportion is p, = fj „ , . The 
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following test statistic is defined: 


l(A±A)( i _l i +A)(l + L) 

V \n, 4- nj \ n, + nj \n, nj 


(14.27) 


The quantity (J x + 4- n 2 ) is the proportion of persons in both 

samples 1 and 2 that possess the characteristic of interest. If the null 
hypothesis is true, then P x and P t equal a common value P that is estimated 
b >' (fi +/,)/(«, 4- /ij). And (/, +/s )/K + n 2 ) multiplied by 1 minus the 
same quantity is an estimate of the variance of X, a dichotomously scored 
variable with mean P. Hence, z in Eq. (14,27) bears a resemblance to 


that would be used to test the hypothesis that — p 2 . 


d. If H C :P X =P 8 is true and if, for both populations 1 and 2, n x P t 
[or «j(l — JP t ) whichever is smaller] and n 2 P 2 [or n 2 (l — P 2 ) whichever is 
smaller] are greater than 5, then z in Eq. (14.27) has a normal distribution 
with mean 0 and standard deviation 1 over repeated pairs of independent 
samples. 

e. To test ff 0 against //, at the a-tevel of significance , the single calcu- 
lated value of z in Eq. (14.27) is compared with „ /2 z and , _(«/j)Z, the I00(<x{2) 
and 100[1 — (*J2)] percentiles in the unit normal distribution. 


f. For large values of rt x and rt t (both equal to 100 or more, perhaps), 
the 100(1 — «)% confidence interval on P x — P 2 is given approximately by 


(Pi ~ Pzl ± 1 -U/2) Z J 


- n 2 ) \n t n 2 ) 


(14.28) 


g. A group of 200 students is randomly divided into two groups of 100 
students each. Students in sample 1 are required to study instructional 
materials in which the concept of transitivity of the relationship '‘taller 
than” is first stated verbally then followed by several examples. In the 
instructional materials for sample 2, the examples are given first and are 
followed by the verbal statement of the concept. Underlying these two 
samples are hypothetical populations 1 and 2 of students who could have been 
selected to participate in the experiment. After studying the instructional 
materials, students in both samples are given one test item to determine 
whether they have mastered the transitivity concept. We wish to test whether 
the proportions P x and P 2 of students in the hypothetical populations who 
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master the concept (as evidenced by correctly answering the test item) are 

^“suppose we choose a to be .OS and that at the end of the 
r _ gg of the students in sample 1 mastered the concept and/* — 54 ol 
students in sample 2 mastered it. The value of r in Eq. (14.27) is 


(68/100) - (54/100) 


’ 1/ 684-54 W 68 + 54 W L . J_\ 

\ \100 + 100/ \ 100 + 100 ) \10Q ' 100/ 


A z value of 2.03 exceeds the critical value of A1i z = 1-96 and, thus, 
constitutes evidence to reject //,: P, ■» P t . We can conclude in this hypo- 
thetical example that stating the concept first and then presenting examples 
of it is superior to the reverse order. 


h. Appropriate techniques for making inferences concerning a set of J 
population proportions (in this section J =* 2) exist. For example, one could 
be interested in the null hypothesis that in a large urban school system the 
proportion of Negroes, Puerto Ricans, and Orientals (J » 3) who leave 
school before graduation is the same. For a discussion of these techniques, 
see Marascuilo (1966) first and then Goodman (1964). 


14.14 

INFERENCES ABOUT P, - P, 
USING DEPENDENT SAMPLES 


a. The hypotheses to be tested are identical to those in Sec. 14.13, 
namely, P x — against H x : P x P t . 


b. Two random samples both of size n are drawn from populations 1 
and 2, respectively. In contrast to the procedures in Sec. 14.13, we do not 
require the two samples to be independent. Thus samples 1 and 2 could 
comprise matched pairs, twin mates, “before" and “after” observations, etc. 
The most frequent application of the procedures in this section arc to problems 
in whKh samples 1 and 2 are the same group of persons observed at two 
different points in time. Thus, it has sometimes been referred to as a 

SST 06 tCU ° f ** d,ange ’” but shaU see tha * litis interpretation is not 
entirely correct. r 


to estaMkh”!! 1 al !, tCC r hn L ^ Ues involving dependent samples, it is possible 
and the oth r 2irS k ^^rvations, one member of each pair from sample 1 

£ Wi,h ° u, “■« “J *— *»• « 

P nd 2 to be observations of n persons at times 1 and 
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Somplel 
0 1 

1 o b f z 

Somple 2 

0 c d n-f z 
FIG. 14.5 n-f % 7, I n~ 


2, respectively. The number of persons scoring 1 on the dichotomous vari- 
able, i.e., having the characteristic being observed, in sample 1 is f and p 2 = 
fjn. In sample 2 ,/ s persons have the characteristic, and p 3 = fyn. It is 
also necessary to determine the number of pairs of observations in which 
both members of the pair (one from sample 1 and one from sample 2) score I , 
i.e., possess the attribute being observed. Such data can be tabulated in a 
2x2 contingency table as shown in Fig. 14.5. 

For example, b might be the number of persons out of n who possess 
the characteristic being observed at both times 1 and 2. 

The following test statistic can be used to test H 0 against H t : 


t — d — a 
yjd + a 


(14.2 9) 


d. When H 0 :P X = P t is true, z in Eq. (14.29) is approximately normally 
distributed with mean 0 and standard deviation 1 , provided that d + a is 
as large as 10. 


e. The critical values against which z in Eq. (14.29) is compared in 
testing H 0 at the K-level of significance are a/2 r and i^ tn) z. 

f. The question of placing a confidence interval on P 2 — P 2 will not be 
dismsswL 


g. A sample of n — 60 persons is asked to indicate whether or not they 
approve of capital punishment both before and after being exposed to a 
persuasive lecture on the abolition of capital punishment. The sample of 
60 pre-lecture responses to the question on capital punishment constitutes 
sample 1; the 60 post-lecture responses constitute sample 2. The data 
obtained are tabulated in Fig. 14.6. 

As an example of how the table in Fig. 14.6 is interpreted, one sees that 
26 persons approved of capital punishment before hearing the lecture and 
disapproved after the lecture. 

The null hypothesis that P x = P s will be tested at the .05 level of signifi- 
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cance. The value of z in Eq. (14.29) is 

d — a 10 — 26 __ — 16 j g7 i 

s/d-Ta ~ >/ 10 + 26 6 

A z value of —2.67 lies considerably below the critical value of , 0 = 
— 1.96. (Note that d + a is far greater than 10.) Hence, the hypothesis 
that in the populations sampled (persons who had not heard an abolition 
of capital punishment lecture and persons who had) the proportions endorsing 
capital punishment are equal can be rejected at the .05 level of significance. 

Sompie 1 
Pie-lecture 



Approve 

Disapprove 


*1 ! 

3 Disapprove ! 
■5. ^ 1 

0 = 26 

0^8 

34 

fi 

v> g Approve 

e = l6 ■ 

| d <= 10 

26 

FIG. 14.6 

42 

18 

n = 60 


h. The test outlined above must not be considered equivalent to testing 
the significance of “change.” Notice that in the table in Fig. 14.7 the over- 
whelming changes taking place are apparent even though the test statistic 
in Eq. (14.29) is equal to zero. 



The significance test of this section is a test of the significance of the 
difference between p t =fjn and p z = /Jn and not of a hypothesis about 
change. Change is evaluated by the significance test presented here only 
as it may be reflected in the difference between P, and P,. 

The techniques presented in this section are due to McNemar (1947). 

Time 1 

Agree Disagree 

Agree 0 40 

^0 60= r, 


FIG. 14.7 


40 

"100 = , 


14.15 

INFERENCES ABOUT THE 
INDEPENDENCE OF 
CLASSIFICATIONS IN A 
CONTINGENCY TABLE 


. nominal measurement of each 

Often observations arc made P ^ ^ modK of classification. For 
unit (person, group, etc.) w«h P ^ ^ t0 sex (male-female) 

example, students can be c * a « sciences, physical sciences, engi- 

and academic major ( hu “ n " • b classified with respect to whether 

■ 

affiliation and their sex. T / « ' (oWe as shown in Fig. 14.8. 

A"in U ixam 0 pl'e d o^how 'h a e table is interpreted, one sees that .4 of the .20 

Pe T„ S general^ we can "“^^“^"f'SmtionTln fhe«ll at the 
columns. We shall denote , by/,,. For example, in Fig. 14.9 

intersection of the Id. ^"^ti l in row ? and column 1 is/.,. 

"* ^ - 36 ' A “ 80 ^ = ,7 ’ 

and /, = « = ^0. 

. • hi* tested is that the two modes of classification 

upon ^ - arc ^ This no,ion 


Political affiliation 



Democrat 

Republican 

Independent 


Male 

29 

36 

15 

80 

Sex ! 

Female 

1 14 

24 

2 

40 


L 

43 

60 

17 

1120 


table showing the relationship between sex and 


FIG. 14 8 Contingency 
political affiliation. 
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Republicans, and Independents in the population will be denoted by F j., F t , 
and P t , respectively. If a sample of size n is drawn strictly at random 
from the population, these proportions can be interpreted as probabilities 
of obtaining males, females, or any one of the three political affiliations. 

It was seen in Chapter 10 that the probability of the joint occurrence 
of two independent events is the product of their separate probabilities. 
For example, if sex and political affiliation are independent of each other, 
then the probability that an adult randomly selected from the population is 
a female Democrat is P.,P the probability of randomly drawing a male 
Republican is P.,P t , if the two c/nssi/icafiour are independent. Testing the 
null hypothesis of independence is equivalent to testing the hypothesis that 
the probability of drawing a person who falls into cell ij of the contingency 
table is equal to the product of the probability that the poison belongs to 
any cell in row i and the probability that he belongs to any cell in column j: 

JT 0 : P t{ ~ P,P. it for all values of / and j. 

The alternative hypothesis, is that P„ = P, P , is not true for at 
least one of the IJ cells in the contingency table, 

b. All that is assumed is that a random sample of size n is drawn from 
the population in question. 


c. The following test statistic is used to test I( D against //,: 

(i4 - 3o > 

where f (j » the number of observations in the (y)th cell of the contingency 

A is the number of observations in the Ah row of the table, 

J, is the number of observations in the/th column of the table, and 
J-' me total number of observations. 
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d. When the null hypothesis of independence of the two classifications 
is true, the statistic y} in Eq. (14.30) has a chi-square distribution with 
(/— !)(/ — I) degrees of freedom over repeated random samples of size n 
from the population. For example, if sex and political affiliation are inde- 
pendent in the population from which the sample in Fig. J 4.8 was drawn, 
then ■/} for the data in that figure should appear to be a typical observation 
from a chi-square distribution with (f — 1)(/ — 1) = (2 — 1)(3 — I) = 2 
degrees of freedom. 

If //o is false, jr* in Eq. (14.30) will tend to be larger than the Xu-xhj-i) 
distribution. In other words, nonindependence can be expected to produce 
large values of y} in Eq. (14.30). 

c. To test // 0 at the a-level of significance, the single computed value of 
X 2 in Eq. (14.30) is compared with the 100(1 — a) percentile point in the 
chi-square distribution with (/ — l)(J — I) degrees of freedom, i.e., in the 
distribution xh-ntJ-ir This percentile point is denoted by 
Selected percentile points in the chi-square distributions are presented in 
Table C of Appendix A. 

f. The hypothesis tested is one involving several parameters (population 
proportions) and the question of interval estimation of some function of 
these does not arise. 

g. The data in Fig. 1 4.8 will be used as an illustration. A random 
sample of n ~ 120 is drawn from the population of adults in a particular 
community. Each person is classified with respect to sex and political 
affiliation. The hypothesis of independence of the two modes of classification 
will be tested at the .01 level of significance. 

Substituting the data in Fig. 14.8 into the formula for y* in Eq. (14.30) 
yields the following value: 

2 _ / 29* 36* 15* . 14 2 24* f 2 2 A 

X “ 12 °\80 - 43 + 80 ■ 60 + 80 • 17 40 • 43 40 • 60 40-17 / 

= 4.776. 

The obtained value of X s = 4.776 is compared with the critical value of 

_ 9.210. The null hypothesis of independence cannot be rejected at 
the*0l level. However, the probability of obtaining a value of x 2 as large 
as 4.776 or larger when the null hypothesis is true is slightly less than .10. 

The evidence for an association other than chance between sex and political 
affiliation is rather weak. 

We shall consider a second illustration in which I = J — 2. A sample 
was drawn at random from the population of first-year graduate students in 
several large universities by the Committee on the Undergraduate Program 
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Academic area 
Psychology Sociology 


Undergraduate None 

f„ = 25 

fi2 = 34 

/i. = 59 

Some 

| fa =151 

/ 22 =49 

f z .= 200 


/, *176 

f. 2 =83 

f. . = 259 


in Mathematics of the Mathematical Association of America. The n — 259 
students in either psychology or sociology were classified as having either 
“no credit” or “some credit” in undergraduate mathematics. The con- 
tingency table shown in Fig. 14.10 was obtained. 

The null hypothesis to be tested is that there is no association between 
academic area and undergraduate mathematics training. In a 2 X 2 con- 

considerably’' 6 ’ r °"' U ' la f °' ** in Eq ' (14 30) «"> •* simplified 

The value of z * with (/ - 1)(/ _ n = (2 - no n _ , . r 
follow" f0r th ' datl mat i' ema, i cs 'raining is found by Eq. (K31) as 

2 _ 259(25-49 - 34-1511* 

59-200-176 - 83 = 22 - 96 - 

sociology) and undergraduate mathemato t,!L '5 m,carca (psychology vs. 
credit) are independent. W c can conchS ,1 *, .1 8 - ( "° Cred,t vs - somc 
year psychology graduate students to h. C that therc 15 a tendenc y f ° r 6«*- 
'mining .ban 

of the classifications in a conth^'„°^ ra^'j^'f '' tcst ° r '"dependence 
permits only the mentis of seSi™ ,h ' «*«"'• Spnce 

■at.on of this technique and the n,eL3 considerations in the appli- 
examined. P esentat >on of references in which they are 

table is small (iess'ttum 10)°u"”f a”w “ of a 2 * 2 contingency 

(14.31) due to Yates (1934) is advLbk“rh°" “ ‘ h ' r ° r,nl,la for *’ in E< l- 
correct, on is , 0 improve ^ The purpose of Yates's cmllmlty 

Z to the chi-square distributiol? with me d" ° f lha ’ 'ampiing distribution of 

ates s correction in textbooks in an i*?* 6 °f Preedom - For discussions 
Ferguson (1966), or Hays (1963), amon P stat,stics see McNemar (1962), 



SEC. 14.16 RELATIONSHIP BETWEEN INTERVAL 


ESTIMATION AND HYPOTHESIS TESTING 333 


There exists a conceptually distinct interpretation of the chi-square test 
of this section that is appropriate when the persons in the different rows (or 
columns) of the contingency table can be regarded as sampled from separate 
and distinct populations. The chi-square contingency table test presented 
above may then be regarded as a test of homogeneous populations. For a 
discussion of the chi-square test of homogeneous populations, see Keeping 
(1962) and Guenther (1965). 

Techniques exist whereby the hypothesis of independence of three modes 
of classification in a three-way contingency table may be tested. See Tate 
and Clelland (1957). 

The problem of determining which subset of cells in a contingency 
table contributes to a significant yf statistic has been dealt with by Marascuilo 
(1966). 

Finally, a valuable series of papers on the use and misuse of the chi- 
square contingency table test appears in the Psychological Bulletin : Lewis 
and Burke (1949), Edwards (1950), Lewis and Burke (1950), Pastore (1950), 
Peters (1950), Burke (1951). 


14.16 

RELATIONSHIP BETWEEN 
INTERVAL ESTIMATION AND 
HYPOTHESIS TESTING 

A relationship exists in most instances between methods of interval estimation 
and hypothesis testing that allows one to determine from inspection of a 
100(1 — a)% confidence interval what the results would be of a hypothesis 
test at the a-Ievel of significance. For example, if the 95 % confidence interval 
around X, on p includes 0, then the hypothesis H 0 : p ~ 0 cannot be rejected 
at the .05 level. If the 100(1 — ft)% confidence interval around r on p 
includes 0, then the hypothesis H 0 : p~ 0 cannot be rejected at the a-level of 
significance. If the 100(1 — a)% confidence interval around X %i — X 2 does 
not include 0, then the hypothesis H 0 : p l — p 2 = 0 can be rejected at the 
a-level. In general, all values along a particular 100(1 — a)% confidence 
interval would lead to acceptance at the a-level of the null hypothesis that the 
parameter being estimated was equal to any one of those values. Conversely, 
any value outside the confidence interval would lead to rejection of the hypothesis 
that the parameter in question was equal to any one of those values. For 
example, if the 99% confidence interval around a particular X m on p extended 
from —4.86 to 8.41, then the same data would lead to rejection of any 
hypothesis that p was any number less than —4.86 or greater than 8.41 ; 
if it had been hypothesized that p was 0, the data would lead to acceptance 
of this hypothesis at the .01 level. 

Suppose that the 95% confidence interval on p is being established 
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It follows that 


around by means of Eq. (14.2). Suppose further that the entire confidence 
interval is above zero. Hence, the lower limit of the confidence interval, 
namely X, — Wn-A/'/'L is greater than zero: 

~ Wn-I ~j= > 0. 

vn 

and that ” 

X 

— r > 

“L 0 ” lr qual i ty S,ales " ,at lhe tet statistic [see Eq. (14.1)] for 
b" on " “ ° °“ cds ,hc 97 - 5 P^ccmilc in the f-distri- 

the 1- 95 ~ni ?' . T • hypMhMis that P » 0 con be rejected at 

matjon n’ecesj^tl'm V “"f conf,dc "“ interval contained the infor- 
mation necessary to make a hypothesis test as well. 


problems and exercises 


■ample on^thtet^andzs nona a hieas Clie Ht^fjnd ldC Inven,or y >'Ores Tor a 
es. His findings arc summarized below: 
Athletes Nonathletes 

«i — 14 „ z _ 28 

*.-119.54 

nonathletes SM^ed p^uiS ' ^ lhat 'n the populations of athletes and 

facilitated or Interfered with K TOunT n h'ij dl!l ? rmincirihc P rcse n ce of pictures 
pre-hrst-grade children win ''“"'"S ° f Twenty 

were illustrated with simple pictures or Iear 1E m d '° nlhcr ,earn words which 
After several learning trials each child ut* ” j C Samc wor ds without pictures. 
* a “6 ht - Tfl « number of correct resnnn! Kt6d ‘! n h,s knovvle dge of the words 

ad the following means and standard deviation's^ lr “ h for “ ch 

tVonp.cr.ee group P,q m „ o u „ 

"-io ITTHi — 

Testih f'-™ 20 J»_-O30 

be considered lhl “ “* two groups can 

mpte from two populations with the i me mean. 
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3. Herman (1967) studied the relationship between grade-point averages (AT) in 
professional education courses and scores on the Minnesota Teacher Attitude 
Inventory ( Y) on a sample of 42 prospective physical education teachers. An 
r i* of +.19 was obtained. Construct the 99% confidence interval around this 
value of on p^. Does the interval include zero ? 

4. Yamamoto (1967) correlated IQ as measured by the Lorge-Thomdikc In- 
telligence Test and creativity as measured by the Minnesota Tests of Creative 
Thinking on a sample of n = 75 ninth-grade pupils. An r n of .12 was obtained. 

a. Test the hypothesis H 0 : p^ — 0 at the .05 level of significance. 

b. Yamamoto also obtained a correlation between IQ and creativity of —.01 
for n = 84 eleventh-grade pupils. Test the hypothesis that the population 
correlation coefficients between IQ and creativity are equal in the populations 
of ninth-grade and eleventh-grade pupils sampled. Test this hypothesis at 
the .05 level of significance. 

5. Wallen and Campbell (1967) intercorrelated scores on the Miller Analogies 
Test (X), the Wide-Range Vocabulary Test (Y), and the Lorge-Thomdike 
Picture Reasoning Test (Z) on a sample of 60 graduate students. The following 
sample values were obtained: 

r xv = .58 r„~.17 r v , -= .10 

Test the hypothesis at the .05 level of significance that p„ equals Px „ i.e., the 
hypothesis that the Miller Analogies Test is as closely related to a vocabulary 
test as it is to a nonverbal reasoning test. Interpret the results. 

6. Thalberg (1967) intercorrelated intelligence (X), reading rate ( Y), and reading 
comprehension ( Z ) in a sample of n «= 80 college students. The following 
correlation coefficients were obtained: 

X Y Z 

X r~= -.034 r~ t =7422~ 

Y r vt = —.385 

Z 

Test the null hypothesis at the .05 level of significance that Prv — p„, i.e., that 
intelligence is correlated with both reading rate and reading comprehension 
to the same degree. 

7. North and Buchanan (1967) asked a sample of 69 teachers (29 Caucasian and 
40 Negro) whether they perceived “poor children” to be generally responsible 
or irresponsible. A phi-coefficient of correlation was calculated between the 
race of the teacher, X, and the teacher’s perception of “poor children,” Y. 

A value of <f> = .24 seemed to indicate a weak relationship between the 
variables; Caucasian teachers tended to perceive “poor children” as irrespon- 
sible more than did Negro teachers. Using the techniques of Sec. 14.11, test 
the hypothesis at the ,10 level of significance that in the population of teachers 
sampled “teacher’s race” and “perception of ‘poor children’ as irresponsible” 
are uncorrelated. 

8. Stennet (1967) sampled kindergarten pupils from rural Minnesota. In a random 
sample of = 873 boys, the proportion who were absent from school 20 or 
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layout of data 


know KK2E3SE to. . researcher wants to 

.0 ie, ,o 2iT r in ,hcir «. P'- 

A different 10 students will studv h SyStem b y outlining the topic, 

other students will study L t ° f the t0 P ic = 10 

the remaining 10 students wilUtudv thT tCXtbook of lhe . same material ; 
machine. The four studv cnnHV Programmed material on a teaching 
observable activity by the learn. ' Tu l lrea, ments) represent increasing 
“learner .cUvityJ r act o r Th ' T ba °' <h« experiment is 

The treatments or the experiment a °!? Pl ' hal relatl!s ,h = fou r conditions. 

called the factor. A or amounts of that thing 

the three treatments comprising might be “ size of type," and 

point 8 poin,. a „ d "’'if 1 ™' be a passage printed in 6 

1“» of the study conditions prST^j^"*” ' vams 10 know ^ at least 



Treatment 


J 2 3 4 



*,,, 

*1,2 

*1.3 

*1,4 

1 

*2,1 

*2,2 

*2,3 

*2,4 

FIG. 15. J Layout of data from an 
experiment comparing four levels of 
learner activity. 

*10,1 

*10,2 

*10,3 

*10,4 j 


The data, multiple-choice test scores of the 40 students, can be tabulated 
as in Fig. 1 5.1. 

You may wonder why ihe researcher doesn’t simply look at the sums 
of the four treatments and see if any two are different. He does not do this 
because his attention doesn’t focus only on the scores obtained for these 
particuiar persons at this particular time. What good would his results be 
if all he could say was that this group of 10 students did better than this other 
group on September II at this place? Although he has data on these 40 
students taken at one particular time, if his experiment is to contribute any 
knowledge to science, his interest must focus on the populations of students 
from which these 40 are only a sample and on the population of experiments 
of which this is only one. The researcher seeks to answer the question, 
“Could I expect these same results if J had chosen a different 40 subjects 
and run the experiment at a different time under slightly different conditions?’’ 


15.2 

A MODEL FOR THE DATA 

To answer this question is to make an inference to a population of students 
and a population of performances of the experiment. The researcher uses 
the methods of inferential statistics to answer the question. These methods 
are powerful tools for experimentation. Like many “servants” of man, 
however, they ask that certain concessions be made to them. The methods 
presented in this chapter are appropriate in making inferences to a hypo- 
thetical population of trials of the experiment if the population of scores, 
from which the 40 scores obtained are considered to have been sampled, 
has a certain form and the sampling is done in a certain way. We may at 
times speak as though the researcher samplesa person when in fact Ihe person's 
score on the test is that thing assumed to have been sampled from a certain 
population. As an example of the type of demands inferential statistical 
methods will make, the researcher must assume that he has drawn the 40 
obtained scores at random (i.e., with equal and independent probabilities of 
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being chosen) from four (one for each level of the treatment) normal dis- 
tributions of scores with equal variances a\ These assumptions may appear 
quite restrictive ; it will be pointed out later (Sec. 15.13) that certain violations 
of the assumptions have little effect on the results of the statistical analysis. 
Nevertheless, the assumptions should be remembered: 


1. The scores were sampled at random 

2. from normal populations 

3. with equal variances a 1 , 

4. and the different samples (four in our example) are independent. 


Notice that no assumption is made about the mean ft of the normal 
populations. It will be seen that the researcher’s question about the efficiency 
of the four methods will become a question about the four means, fi u . . . , 
of the four normal populations. 

We must now phrase the question about the efficiencies of the four 
methods in more precise terms. Toward this end, we assume that any one 
of the 40 scores can be represented by a linear model, the sum of components 
(none of which has to be squared, cubed, etc.). The linear model is a 
decomposition of a score X„ into a sum of terms that have a certain meaning 
to the statistician. We shall ••tentatively entertain" the following linear 
model for the A„s: b 

(15.1) 


where X„ is the (th score in the yth group, 

** ' ! ,. a ‘"T, (e , q r' n a ''"' aS ' ° r ,he r °“ r P°P“lati°n means) that 
scores ' f0t 3 40 S '° reS aml rCfl ' C1S lhc 0, " aM elevation of the 

'hrTSnnl f',! 0 5C Tf in Erou l’-'’ and "leots the elevation 

afikeVand" 1 “ (^Wl^in^up”) we'reTeaTed 

is 12, a. is ™ " P ' rimCnt * 

fo T s ™ ,hc ^ Tha '“o' a * op 

elfect level j of the treatment the ' u ’ s; bul “< reflects what 

question, -Do the four methods differs- cT’"*' 3 h ' rH “ rcl '' r ’» original 
Is It not true that a. = «, — a £ an " ow be stated more precisely: 

say that the four levels of th? .* r - a , . , e wisbes t0 ^now if he can safely 

methods to be developed will ™ do . BW haVe thc same effect. The 
Pet "ill be appropriate for testing the hypothesis. 
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called the null hypothesis (because there are no differences), that «! = oc 2 = 
a, = a 4 . 

One needs a method of testing this hypothesis because such things as 
the e (j 's exist. If there were no error in the linear model, the researcher 
could simply look at the sums of the scores in the four groups to see if they 
were different and thus test his null hypothesis. But the e„’$ are there and 
they must be dealt with. How do the e xi 's arise ? They come about in various 
ways. First, persons or whatever is measured are inherently different even 
when they are treated alike. Of the 10 students who outline the English 
money material, some will score higher on the test than others simply because 
they are more intelligent. No amount of control over the conditions of 
study would result in all 10 students achieving the same degree of knowledge. 
Second, errors arise when an attempt is made to measure the students. 
These errors are due to unreliability of the measuring instrument, a multiple- 
choice test in this instance. A group of persons will not all earn the same 
test score today that they will tomorrow even if there were no forgetting. 
Third, errors arise from all manner of uncontrolled happenings during the 
experiment. A student may break his pencil and have to obtain a new one; 
someone may feel ill and not do well ; someone el se may not li ke the researcher’s 
look and act uncooperatively. Additional error arises if the postulated linear 
model is not correct; it’s possible that no linear model will give an exact 
description of the data. 

All of these sources of error combine to make the results of the experi- 
ment today different from what would have been obtained yesterday or 
tomorrow on different groups of 40 students. The results of the experiment 
run at any one time will undoubtedly show differences between the four 
groups. It remains to be established, however, that the groups would be 
different if the experiment had been run on a different randomly chosen group 
of students under different conditions. 

Let’s go back to the original data and see how the null hypothesis that 
a, = a 2 — oc 3 = aj can be tested. First we must develop some machinery. 

The sum of the scores under level one is denoted by 
10 

The sum of the scores divided by 10, the number of scores summed, is the 
arithmetic mean of group one. The mean will be denoted as follows: 



As you’ll recall the bar above the X indicates a mean \ the dot in place of 
i means that the mean was found by summing over /. How do you find X.J 
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We also need a grand mean, the mean of all 40 scores. To find the 
grand mean, we sum all 40 scores and divide by 40.' The grand mean is 
denoted by 


X_ 


I!*,, 

40 


Once again, the bar above X indicates a mean’, the two dots in place of 
the i and J indicate that both i and j have been summed over. 

If one takes the difference between each score in group one and 
squares this difference, and sums the squared differences, this sum has the 
form 


ly<r - X,)* 

This sum divided by 9, (n - 1), is the sample variance for group one and is 
denoted by sj: r 




T.,1’ 


Souo onf "irr of 5 ,h ‘ varia ”“ : «f tke population underlying 
° 15 also lhe variance of the three other population!, j; 
an unbiased estimator or this common variance o'. jf and r 7 arc also 
unbiased estimators of a 1 . What does ,« stand for? "d s. are also 


■j- y (*„- 
9 

„ we bavt ,h5 machinery necessary to continui 
ant to knot. ,r we can sarely say that it is nor true that , 

153 

ESTIMATES OF THE TERMS 
IN THE MODEL 


Remember, we 
= a* « a, = a,. 


rt °nd ^reMWn’^rnaX^M “ ' h ' 40 X « s Tht 

are possible. The statistician has pITh ' ‘ K but “ t ! am “ tim ates of them 
or the linear model that has verv user 1 a a ^ cstlmatc thc components 
-Eat are called W^Z. Pr°Pcrti«. He obtains 
cast-squares estimates by d s , . A* a r, and e„. Denote these 
least-squares estimates the ,he 1”°““ ° r obtainina the 

assumed to sum ^■^rs of Rentes, interest, x «/, are 

reasonable sort or restriction to'adom .V 7 a -r ~ °- This is an entirely 
elevators" or “depressors" above? l!? 2Ve “"“''ed of the «/s as 

fEuve or below a general level that is embodied 
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in ihe parameter p. By placing this restriction on the a/s, the statistician 
almost predestines that the a/s will have the properties of deviations from a 
mean fi, since the sum of deviations around a mean is zero. The statistician 
has found that 

fi - x.., 

i, = X„ ~ X., 

~ X.i- 

The estimates are made to fit the observed data in the sense that 


X ft — ft -j- a f -}- e tf . 

Notice that 

X„ = X. + (X., - Xj + (X„ - Xj. (15.2) 

The first term on the right is fi, the second term is a,, and the last term is 


15.4 

SUMS OF SQUARES 

The motivation for the steps that follow won’t be clear until much later in 
this chapter; let’s proceed to do some arbitrary-looking manipulations on 
Eq. (15.2). 

First, subtract the grand mean X.. from each side of Eq. (15.2): 

X„ - X.. = (X., ~X„) + ( X „ - X.,). 

Second, square each side of the above equation and sum both sides over j 
and i: 

4 10 4 10 

1 I(*u - = 22 l(X, - X) + (X u - X f )]\ ( 15 , 3 ) 

The quantity on the left in Eq. (15.3) is called the total sum of squares. 
For the entire set of numbers obtained in an experiment, the sum of the 
squared deviation of each number from the grand mean is the total sum of 
squares. 

Let’s consider the right side of Eq. (15.3). Notice that if we let a stand 
for (X j — X.,) and b stand for (X it — X mj ) then the right side involves the 
quantity (a + b) z . You already know that (a + 6) 2 = : a 2 -f- 2 ab + b z . So 
it’s easy to show that 

ilex., - xy 

«= 2 2 KX, - xy + 2 (X, - xyx., - xj + ( x„ - 

W(-l 
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The right side of this equation is the same as 

i'ia.,- •?..)■ - 1 - 22 2 (*.< - x.w„ - -s’.,) +22 (x„ - x.,f- 

i=ii=t 1«- 1 

(15.4) 

Consider for a moment only the middle term in Eq. (15.4). Since the 
quantity X, s — has no i subscript, the £ sign for i can be moved past it; 
thus we may write the second term as follows: 

2ZC*., - X.)J ■<*„ - A".,). (15.5) 

Say for the moment that we are in group one and that we want to find 


2<*.i - 

For this group, X., is a constant; using Rule 2 of Sec. 2.5, then, we can show 
that 


2 <*„ - *.,) -2*„- ioj?, = 2 x lt - -4_1‘ _ o, 
10 


Going back to Eq. (15.5), we now see that 

xj 

- JKa’.i — X.) ■ o + . . . + (X, — X ) • oj — o. 

1.1, (X » - *••>' =2,1 (-r., - y.r +ii(x„ - a,)*. (i5.6) 
appears. The quantity^?, '"' a’ 0 )' is the^a' ° f ^ (IS ' 6) subscri P t > 

/ has been fixed. For examnle "for ,h c " f ° r “ ' rrom onc to 10 whm 
group one, (P, _ p,. is t J ^ xh C „"™; e ““'' d ““>> sc °"* 1" 

,1.?, 2 10 (*,- JO'. 

R T:; " have b ™ kc " ,hC ,0,a ' — “l-squares into two parts: 

=2io- (P, - xy +ihx„ - P,)’. 

SS M = ss b „ + ss 
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The total sum of squares (SS tot ) has been partitioned (analyzed) into 
two additive components, the sum of squares between groups ( SS iet ) and the 
sum of squares within groups (SS mthin ). SS tot reflects variation in the scores 
obtained. This total variation has been analyzed into two components, 
SS M and SS mUMn . Thus the name “analysis of variance,” (abbreviated 
ANOVA). You will soon see how this analysis is used to test the null 
hypothesis that a, = a s *= a 3 = a 4 . 


15.5 

RESTATEMENT OF THE NULL 
HYPOTHESIS IN TERMS OF 
POPULATION MEANS 

Before proceeding, let's phrase the null hypothesis that = a 2 = a 3 = a 4 
in an equivalent but slightly different form. The estimator of a x was taken 
to be = X A — X.. . Since a, is an unbiased estimator of <x lt the expectation 
E (long-term average) of a ; is a 3 : 

E{ ft) = E(X. x - X.) = E(X.i) - E(X ..) = p 1 -p = <x 1 . 

Similarly, a 2 = p t — p, a, = //, — p, and <*« — p t — p. 

The null hypothesis can be written as 

(Pi - P) = (Mz “ A*) = 0's - P) = (pt - P)- 

Add p to each of the four terms and you will see that this is obviously the 
same as saying that p x = p % = p t = p v Hence , the null hypothesis that 
aj = a 2 = a 3 = a 4 in the linear model X lf = p + a, + e l} is the same as the 
hypothesis that the means of the normal populations from which the samples 
are drawn are all equal , i.e., 

P\ = Pt = /< 3 ^ Pi- 

We may state the null hypothesis in either form. A third equivalent form, 
when n x = n 2 — u 3 = n u is 

H 0 - 2 (ft ~ Pf “ °> 

where p, is the average of p„ . . . , pj and equals p. 


15.6 

DEGREES OF FREEDOM 


Some more machinery is needed before we can show how the null hypothesis 
can be tested. 

We must associate with each quantity in the partitioning of the total 
sum of squares (SS tot , SS bet , SS mlhin ) an integer called the degrees of freedom. 
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“Degrees of freedom” is a name borrowed from the physical sciences where 
it denotes a characteristic of the movement of an object. If an object is 
free to move in a straight line only, it has one degree of freedom; an object 
that is free to move through any point in a plane, such as a bowling ball 
rolling down an alley, has two degrees of freedom; a ball in a handball 
court has three degrees of freedom: it can go from back to front, side to side, 
and floor to ceiling. You may be surprised to learn that the techniques we 
group under the rubric “analysis of variance" or ANOVA (the partitioning 
°f SS M and testing the null hypothesis) have a geometric interpretation. 
The name “degrees of freedom” enters into analysis of variance (ANOVA) 
by way of its geometric interpretation. 

First, let us discuss the degrees of freedom associated with SS t , t . 
Consider the definition of SS M : 

SS*„ = i 10 ■(*[,- JO*- 

The four group means, X. t X„ are related to X_ by the equation: 


■?. + X. + J. + JP. _ 

5 (15.7) 

! r P" - 4 p 1 “ “ 4 ; and ” 8 ' ,,hat musl Pa be 1 x 1 must be 9. 

3 If ? \ 3 ' u" 6 ' a ” d X ' = 4 ' what mnst X -i be? X, must be 

o three' IV ’ ' h '" W !,cc ,0 as!i S" “7 whatsoever 

o three oT the group means, but having done so the last croup mean is 

jKXS T(4 h ! V n a jf e “I 3 ’ ^ <’”> * 

7—1 Wewiti'flbh ■ ... ’ one less than the number of group means, i.e., 

L h degrees of freedom for ssj' todf., 

Consrder now the degrees of freedom associated with SsJ^. 

SS ” X,f. 

For group one, the computation or SS, m „ involves 

(x " ~ + <jr ” - *•■)' + • • • + (Jr„.i - Jr 

How are X n , X u and )?, relaM? 

+ X t , 4 - . . . + X,. , 

10 = 

If X., were a fixed number *av n xn . . 
f't ,h ™“S h could you ass’imVnv numb^” ma " y . ° r lhc 10 quantities 
to assign a value or values "} bcr 7°‘ J wished before you have 

Similarly, alWS 10 mak ' be 12.40? The answer is nine. 

<X “ ~ + (X ” ~ + . . . + (*„ , _ Jfje 
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is used in computing SS wl ,t, in . Since (X l2 + X 22 + • • * + X J02 )/lO — X, 2> 
if X, is fixed, then nine of the 10 single numbers can be freely assigned before 
the tenth number must be assigned that gives the proper %, 2 . There are 
nine degrees of freedom for each of the four groups in the computation of 
SS mann \ hence the degrees of freedom for SS wllMn equals 

(10 - 1) + (10 - 1) + (10 - 1) -f (10- 1) =40-4 = 36, 

the total number of observations less the number of groups, i.e., Jn - J = 
J{n — 1). Space can be saved if “degrees of freedom for SS main " is denoted 
by df KllMn . 

SS tol equals 

(a\, - xy + (*« - xy + • • • + (*m.4 - %y-, 

41 terms are involved, and they are related as follows: 

y.i + x n + . • • + x l0 .j _ ^ 

40 


Of the 40 quantities on the left of the equation, 39 can be assigned numbers 
without restriction before the fortieth one must be given that number that 
yields the preassigned . The degrees of freedom fir SS,„ equals 40 - 1 = 
39, the total number of observations minus \,t.e.,Jn - I. 


15.7 

MEAN SQUARES 


A sum of squares (SS) divided by its degrees of freedom (df) is called a 
A sum o q one-factor ANOVA, only two mean squares will 

beTfEs equals SSJdfi and the 
be ol inter 1 |s sSJdf,.’ A "mean square total 

wfflMt be defined because it would prove useless for the purpose of testing 
the null hypothesis. However, an important relationship to keep in mind is 
SS b + ss u I = SS, ; another is df h + df K — df - Jn 1. 


15.8 

EXPECTATIONS OF MS, 
AND MS W 


, . ... „ nint hopefully, you’ll start finding the answers to some of the 
questions which must have arisen in your mind while studying the last few 

. . U mainf w of this Chapter “between” will be abbreviated to b, “within" 
s“, write SS., SS,. df.. df.. MS.. end MS. 



348 THE ONE-FACTOR ANALYSIS OF VARIANCE F1TFD EFFECTS 


ClIAf. IS 


pages. Why partition 55, into 55* and 55.? Why define degrees of freedom 
and mean squares? To test the null hypothesis //,. of course; but by looking 
at the expectations of A/5* and A/5., you’ll begin to sec how the machinery 
developed so far relates to // 0 . 

The expectation of A/5. means the long-term average of A/5. from 
experiment to experiment. The expectation of A/5. will be denoted by 
£(A/5 K 1. If in our learner-activity experiment we were to repeat the experi- 
ment an indefinitely large number of times and each time compute A/5., 
then the average of all of these A/5, would be £(A/5.). Wc can conceive of 
doing this even though it would be impossible to do so physically. £(A/5.) is 
the average of all of the A/5, in that population of experiments about which 
we wish to make an inference from the data obtained this one time. 

We can write £(A/5.) in terms of a characteristic of the normal popu- 
lations from which the scores obtained in the experiment are a random 
sample. 

First, let’s look at A/5, in a slightly different way. 


MS. = f 


Z2(x„ 


4 • (10 — 

2 (*„ - x.,f 


2W.- X.,f 




2(*.-W 2<x,. 


wj 


,ln l - X. ,)>/9 the sample variance or group one? Wc shall denote 

jJ^respecltvely Sa ' , ' Pl ' Varil " l: ' 5 ot S™"!* '*», »'Tee, and four by aj, jJ. and 

MS ; = + *’■ + MS. is the average or the 

variance M the” f ™ Up5 ' W ' s,a " d ea,li " «“> «•!> the 

samrfed S nee >7? "°" rr ° m " hich ,ht ««« in E">"P »“ vm 

SZrgue thaT n °"" a ' P ° P “ h,imS h3d ,h ' tte 

WSJ - i£ (! ; + ... + «*)_ ilE(s!) + ... + £( ,.„ 

= Ho* J-ff 5 +o* + o ! ) = 

The expectation of A/S. « 0 *. 

underlying the groups fnTh^^ dt ? Cnti on thc means of the populations 
only the " S * U “^n-free ’’ ‘reflecting 
y am ° ng the measu res within groups. Such variability is 
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around each group mean, rather than around the grand mean of all the groups. 
Whether all of the groups are samples from the same normal population or 
all the normal populations have different means, £(MSJ will be the same, 
a 1 .* 

The same cannot be said for E(MS b ), however. We shall show that if all 
of the normal populations underlying the samples in the experiment have 
the same mean (we’ve already assumed they have the same variance), then 
E(MS b ) is a-. If, however, at least two of the population means are different , 
then E(MS b ) will be larger than a 3 . If all the population means are equal, 
then the null hypothesis that p x — p t = p 3 = p t is true. If at least two 
population means are different, c.g., p b = p t but p 3 ^ p t or p 3 i± p t ^k 
p 3 ^ ft t (these are two of many possible examples), then the null hypothesis 
// 0 is false. 

We shall state without proof that 

£(MS,) = - , (15.8) 

where a 2 is the variance in each population, 

n is the number of subjects in each group (10 in our example), 

J is the number of groups (four in our example), 

H } is the population mean of the y'th population, and 
p is the mean of the / population means. 

Let us say for example that all four population means were 6.45. (This 
is a case where H 0 is true.) Then p would be 6.45 and 

%(Pi — P)~ = °- 

Take an example where H 9 is not true and satisfy yourself that E(MS b ) 
is greater than <r s . 

To summarize: 

J. If H 0 is true , then 

E(MS W ) = <r* and E(MS„) = o*. 

• Convince yourself that l/us is indeed true. Start with 

E(MSJ) = £((SS n + SS. t + ...+ SS,j)!J(n - I)]. 

For each substitute its equivalent, (n - I Then distribute the expectations 
and simplify. Further, note that the definition of SS rj is 2 ( x ‘i ~ ■£<)’» whIch do «s not 
involve the overall mean, •?„ . 
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15.9 

SOME 


2. If ll 0 is false, then 

"i (ih-i'f 

C(MS.) = o' and £(MS.) = o' + ■ •■' _ ^ . 

which is greater than er\ 

In any experiment, MS, and A fS t arc known, but the values of their 
expectations are not. By comparing MS t to MS,, one can determine if // 0 
is plausible. If MS t is very large relative to MS,, then it is likely that //„ 
is false. But the test of the null hypothesis is not as simple as this. It can 
be shown that MS„ and MS, arc independent of one another (this is one 
reason for the original assumption of normal populations). 

Suppose for the moment that you draw four samples at random of 
10 scores each from a normal population. This is exactly what would be 
done if in our example experiment four conditions were equally efficient (in 
Urms o -the average score they produced). The 40 scores sampled could 

Rem U '"I ■ ' S “ Ch ai e T2b ' C ' ^ >' ^ *' S - a " d COuM I* Computed. 
Remember this IS a case of //„ being true. Would MS. and MS, be equal? 

MS Il!l, r ,'b XP<! 'r,", 0n i” 0Ul<i ^ eqoa,; bm from me “"Pllns >» the next, 
S ttae Th , E f ' ' ta " ,,S ‘ ,his ,im '- a " d "»ybc a lot smaller 
from Tl 'v“ m ? ,c val,lc ° r MS - ""d MS, still fluctuate independently 

rSrSKss 

a large value of MS, relative tl Ms did!*"’ ■ h “' V h ' be SUrC ,hal 

ations in scores sampled from th! same no T" S "i" P . ! ' by ra " dom nuctu - 
can never be certain! but tve shall see in tb p ° pula "° n? Thc researcher 
•he proportion of times he „1„ C °" 1^0, 

DISTRIBUTION theory 

Wc saw in Chapter 11 that a e-F,: , 

has the form a ch "‘ , > ,,a " va "»hle with one degree of freedom 

<*-»)" . 

«* ~ z ‘‘‘ 
deviate is distributed 


• e., the squared normal 
freedom. 

H andvTriana^' 5 3 normaU y distributed 


as chi square with 1 degree of 
variable with population mean 
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Suppose one randomly samples n scores from a normal distribution 
with mean and variance p and a 2 . Since chi-square variables have the special 
additive property noted in Chapter II, the quantity 

-/'> 2 , , (X n -?f t 

o 2 a 2 + a 2 ~* n ’ 

The sum of the squared z scores has a chi-square distribution with n degrees 
of freedom, i.e., this sum computed on repeated samples of n scores has a 
known frequency distribution y*. 

We shall not prove the assertion; however, it is true (see, e.g., Wilks, 
1962) that if X Jf . . . , X„ are n independent observations from a normal 
population with variance a 2 , then 

(x, - x? , (x, - xy . . (x. - xy , 

~ + ~77 + "’ 7 *-■ 

Therefore, £ (X, - Xfh 1 ~ Also, £ < X .i ~ T.,)V ~ xl-i- 

Notice that A 1 , the sample mean of the n observations, has replaced p, 
the mean of the population from which the observations were selected. Also, 
the chi-square variable has n — I instead of n degrees of freedom. 

For the experiment at the beginning of the chapter we defined SS K : 

SS W - | (X u - XJ + 2 (*,» ~ ft)* +•••+! (*« ~ X t )\ 

If we divided the first quantity on the right of this equation by o z , it would 
have the distribution yf (chi-square with 9 degrees of freedom). This is 
so because we assumed the scores in group one were randomly drawn from a 
normal population with variance a z . The same can be said for the other three 
quantities on the right of the equation. Since all four quantities are dis- 
tributed as y*. then their sum divided by a 2 is distributed as *?<►+»+»+»> or 
Xl s (chi-square with 36 degrees of freedom), ffence, for our exampfe 



We can divide SS V by 36 to obtain MS K ; then 
MS k ^ % 2 e 
a 2 ~36 ' 

This fact wilt be saved for future use. 

In general, how is the mean X. of a group distributed? That is, what 
frequency distribution would result if we sampled n scores at random from 
a population, computed X., then recorded only this one score, .F.? What 
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would the distribution of these scores be like if this process of sampling, 
computing X.. and recording it would continue indefinitely? What would 
be the mean, variance, and shape of this distribution? 

If the means, X's, are based on n scores randomly drawn from a normal 
population, they will: 

1. be normally distributed themselves, 

2. have mean /<, and 

3. ha\e variance o l /n. 


nas a mean c 


Using the above facts, we can see that (X. — h 

zero and a variance of 1 ; hence, it is a z score. Consequently 

«r*/n a® ~ Xl ‘ 

If the null hypothesis is true, then 

Z - Xf 

? **• 

because under that condition E(X .) = £(X ,) = £(X t - Ft P s reV \ 

ZZXZZZ in 0,,r 

Eq. (15.9) by 3, we see ihai h Dividing both sides of 


(15.9) 


[i'OtTj - * )<]/ 3 


i 

o* 3 ’ 


provided H„ ihe n„u hypothesis, is ln , c . 

variables each or ^hich”™ diliTa ' ' J,' . " I: “ lhc ra,io of Iwo chi-square 
distribution. d ' v,,Icd «* degrees of freedom has an F- 

itself ' ° 3ppcars ,n both numerator and denominator it cancels 

should recognize that the numerator is MS, and the denominator 
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is MS W . To summarize then, if the null hypothesis is true, 


i.e., the ratio of the mean-square between, A fS„, to the mean-square within, 
MS W , has an F-distribution with 3 and 36 degrees of freedom when (i x = fi 2 ~ 

,h The ratio MSJMS « is called the F-ratio. The F-ratio is the statistic 
that will be used in the last steps of the test of H 0 . 


15.10 

THE F-TEST OF THE NULL 
HYPOTHESIS: RATIONALE 
AND PROCEDURE 


To coalesce the distribution theory that has been developed, consider the 
act of repeatedly sampling at random four groups of 10 scores each from the 
some normal distribution (note that under these conditions the null hypothesis 
is true) If each time this were done the F-ratio MSJMS, were ca'cu'ated 
and Us value recorded, the frequency distribution of these F- ratios (many, 
tnanv of them) would look like the mathematical curve F,.„. This knowledge 
^extremely important, for with it the statistician can calculate the percent 
is extreme y P 11 35 or 6.12; or he can determine the number 

° f “ * ded only by 5 % or 1 % of the F- ratios. He does this assuming 
The F-ratio? obtained in this procedure of repeatedly sampling 
calculating, and recording would all be greater than aero and the largest 

"““w^uStmphngof Sbur^ps of 10 scores each was not done 
wnatii me n ” , i x „ what if sampling was done under the 

fromthesameno malpopuato,^. ^ ^ ^ ^ ^ 

condition of ^ , arger ttlan it wou ld be ir the null hypothesis 

C ° ndl . ™ However MS, would have the same expectation, o>. When the 
fa i' se MS, does not have a distribution and the ratios 
WS IMS obtained from the sampling do not have an F-distribution. We 
? c w however what effect this sampling under conditions of a false 
n k h thesis has’on the distribution of the F-ratios. They will be on the 
rigger than the F-ra.ios obtained by random sampling from a single 

normal P 0 P“ , ‘‘ t '°" ndom , y sam p!cd repeatedly Tour groups or 10 scores each 
r SUPP .Lmal population and calculated the F-ratio each time. The plot 
fr “c°F ratios obtained would look like the curve F,.„ in Fig. 15.2. Now 
suppose^we drew repeatedly two groups of 10 scores each from a normal 
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FIG. 15.2 Sampling distributions of F = MSJMS. when H, is Irue 
(curve F,.„) and when //.is false (curve F»). 

population with mean fi u and two more groups of 10 scores from a normal 
population with a dilferent mean y... The null hypothesis is false, but we 
wouldn't in practice know this because y, and y, arc unknown characteristics 
of populations. The distribution of the T-tatios obtained would look sorne- 
thing >*0 curve F* in Fig. 15.2. Notice that in general the F-ratios shift to 
the right (arc larger) when sampling is done under conditions of a raise null 
hypothesis instead of a true null hypothesis. Even so, some of Ihe values in 
the curve F,.., are larger than some of the values in curve F* 

8* t3 " r ,han more likely ,o occur if one is sampling 
unde Z,T * “f " h ' K hypothesis? Compare the areas 
“arc i e 2! ' ° f ,he P”” 1 3 ' 25 a " d «« which area is 

3.25)! Pea, ' r P roblbilil y »f yielding a value or F greater than 

can 2^!!'"",“" r Bnd ,h ' P ' r “ n,i,t P° iws or the curve F, He 
?“ Ch ““ S * “ 1 X of -he valu« i„ the 
curve f“. ^,e 95m2" fi " d ,h ' 95,h a ” d 99 ‘ b percentiles of the 
is 438. Table E of ADDendi^A" s!*' F%M ‘ S 2 ‘ 86; the 99th 
99.5th, and 99 . 91 !!',*™^ r Sh< ” S tb ' 75th ' 90lh ' 951b > 9?-5tb, 99 'h, 
Chapter 1 1 the F-distribution d- y aTIDUS ^-distributions. As we saw in 
for 'he numerator and the degro „f° f " ‘ Va ’, UK: ‘ he d ' E ' e ” of f"'*” 
f-ratio. To find the 95, h S,?” ' hC d ™° mina, or of the 
intersection of column 2 J. 50 fo .,?? F ‘f find lhc 31 Ibc 

percentile. The value of ^ ^ js 3 j' ,hal ' x>rtl °" of the table headed 95th 

with 3 and 36 de^STirf “ F ‘ ra,io ° r6 - 51 was obtained 

curve 

irthe null hypothesis £ IZEZF* the form curve F,.», 

Less than five times in 100 a'vjlu. n S'« r y unlikely would have occurred, 
obtained if ih c null hypothesis were ,argCr than 186 wou?d ** 

jpo cs.s were true. If the null hypothesis were false. 
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then large values of Fare more likely to be observed. The researcher agrees 

to reason thus: . . ....... 

If the value of the F-ratio obtained would occur less than five times m 
100 (i e„ if it is greater than thc9Slh percentile off..,,) if the null hypothesis 
were true, then I’ll conclude that the null hypothesis is false. It seems more 
likely that such a value indicates a false null hypothesis since large values 
are more likely if the null hypothesis is false. .. 

Choosing the 95th percentile of the curve F 3M as the point on which he 
decision about ff 0 hinges is ralher arbitrary. One could have chosen the 
90th 99th or 99.9 h pementile point. What if the 50th percentile point had 
been' chosen ? That is. what if one had agreed to reject the „ul hypo » bests 
if the F-ratio was greater than the 50th percentile of F a . 38 ? If the nul hy 

nothesis were true 8 one would have a probability of l of rejecting H a (calling 
pothesis were iruc, 50th rcentj | e point to make their 

it false). If researc er g conc i u ding that one method is better than 
d'T"* e'S hods fv eb'^tuairno Cerent. Scientists wan. to 
another when the me errors- so they agree to conclude that 

guard against these than the value of 

the null hypothesis 15 J“' se babm ' of o q ccurri „g when the null hypothesis 
F obtained have a small P , 0 05> or 0 , xhese values corre- 

U trU d' ■nT.heSOth SSth and 99th percentile points, respectively, 
spond to using the 90th, 9 . thal ir the f-ratio obtamed in 

Do not make the m ' “ e e °; “ h S , hesis u certainly false. Such 

our experiment were . b| ' 5 . our situation*] A researcher makes a conclusion 
assertions are not truestalemi: nt about the means or the populations 

of the form 1 re J c ,f‘ as . ." He is never certain of the truth 

I’ve sampled’ or iW“ ) howevcr , , hat his conclusions will be 

of his conclusion. He ooe of lhe times he makes them 

correct a certain percent (90 95 /. 

given the truth or falsity »* activity levc , 5 were all alike with respect 

Suppose that the Suppose also that our researcher ran 

to the amount »ch,e v e^ J groups , a veiy larg e number of 

the same expenmen , . „ n f. ra tio and draw a conclusion about 

times. Each time he won Id compute “"^rve f,.„ tha. .he obtained f- 
H 0 . He is not aware it o We knQW< for exa mple, that 5% 

ratios would follow has point 2.86. Our researcher 

of the area under thuemve. res. o the , 86 _ hE wil , say ft., ft 

has agreed that if the f. ral ios he obtains to be greater than 2.86; 

is false. He will ' exp==t 5 A ^ , he mistake of rejecting H a when it is 
hence, he will be expeeie 

really true 5 % o c (2 86) the researcher chooses, such that any 

Notice that *^,'' a '“ e be < r i s regarded by him as evidence for .he falsity 
f-ratio exceeding tha had choSEn 4 3Si ,he 99th percentile point in the 
of ff., is arbitrary. 
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distribution F SiSg , and if H 0 is true, then on the average one time in 100 he 
will obtain an F-ratio exceeding 4.38 and conclude incorrectly that H 0 is 
false. The probability that the researcher will reject H 0 as the true state of 
nature when /f 0 is actually true is called a (alpha). The size of a is under 
the researcher's control; he can make it as large or as small as he wishes by 
his selection of a number from the F- table that will govern his decision. 

It is customary to assume (as we saw in Sec. 13.5) that there are two 
states of nature that can exist relative to our ANOVA model: can be 

true, or it can be false. A researcher agrees to make one of two conclusions 
after inspecting his data: Reject as explaining my situation, or do not 
reject (continue to entertain) H 0 as an explanation of my situation. Four 
outcomes of an experiment are possible, as shown in the accompanying 


Stote of nolure 
h q is true H 0 is false 


Type I error 

No error is mode 

No error is mode 

Type n error 


If the researcher does not find evidence for the falsitv of H and H is 
will be iVaccoldwitMh'e^ Impute mi u" ' abk) ' ' htn h ” iS ,hol ' 8 " htS 

clE, ft “ ’hc h X “ ,n#r in "*»« his con. 

••■ype I error" or on S , w ' h ' has mad ' a 
the researcher fails to reject /f. wesavh e"h' If ls actually /ufar, and 
or an "error of the secoild hind •• Thcf COmm " t " 1 0 “W* 11 "Tor" 
that one is a ,,p c l e rr „ r a „dT. " ,m “ are a convenience; 

The probability of committing a type* elror ii ° rn ° si «" mcanK ' 

can control the size of a thu< * T ? r s et l ua 1 t0 «• The researcher 

f/a eery large (a „ .25 or .30) or very 'small'? ° f '"“"cclly rejecting 

to assign a value to nor .05 or .01 Mlihn n 7 ,' 01 or - 001 >- It is customary 

for industrial or agricultural research ■ ^ and -01 may be appropriate 

popularity, there fe little reason to^d ”' hl 'i h . tl,e ANOVA first enjoyed 
ttonal and psychological research Th” 16 , lhC ' r usc cxc lusively in educa- 
t° use should depend „„ eerSu as J, i?” ° r “ a marcher wishes 
of C-.15 or .10 might b 5 iustiriahl %° F h ' S T‘ artlcuIar analysis. Values 
t"=uded i„ , hc „ pe E (im *J «ml, number of subjeels are 

related in an indirect way t 0 ,h e ornhsh S , h ° wn ,alcr ' thc si “ « r « is 
'» 'eject a false hyjLest/ V?!* II error (failing 

2 Po ests). The probabrlity of failing to reject a false 
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U. 



FIG. 15.3 
for 7 «= 4, 


tl.uslra.ion of the probabilitteoftypcland type 
n c 10, and a particular alternative hypothesis. 


11 errors 


, . . .„ nt rol of the researcher to some extent, ft 

null hypothesis .s under t s ' ^ expcrimcn t and the value of « 

depends on the "“ mbc ' ^ sha i| denote the probability of correctly 
chosen, among other things. amount by 1 — &. So the prob- 

rejecting H a when it is false y some g bMlily 0 f rejecting H 0 when 

ability of type II error • Th e larger «, the larger 

it is false; it ts called t e P / mpl)rtan , to the researcher. All too 

1 — 0. The value of 1 P H .. BO wer " They invest large amounts 
often, people com P lc ‘ cly * ri | ent in which their probability ofdiscovering 
of time and money in an expe 2 xhat j s , even if differences 

differences between the treatme y ^ ^ chanC e S in , 0 of rejecting 

of a given magnitude ex, st There's a hjgh probabi | ity that 

7f 0 (“discovering that differen wh£n it is , here . 

he won't find what he s l “ ok " g . ' , at , 0 5 the corresponding value 

tt may happen .tat if • research. tr * * < t « ^ ^ ^ obviously , 

of 1 - P w’ould be .20; but if « • • a tcr risk „f making a type I 

the prudent course or action “ , _ l D r finding differences between the 

error to increase the probability, p. 

treatments. , (heir relationship are depicted in Fig. 

The definitions of. an P ^ ^ f . tKt of Ha will be illustrated in 
15.3. The calculation ot tne pu 
Sec. 15.15. 


the’ ONE-WAY ANOV A WIT H 
n OBSERVATIONS PER CELL 
(SUMMARY) 

treatment factor and J levels or that factor is run. 

An experiment with one ^ ind „ en dent observations are taken. The 

Under each of the J le (|wt , he „ observa tions in any one level are 
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independent observations from a normal population with variance a 1 . The 
variance tr 1 is assumed to be equal in all J of the levels. (This assumption 
is often called the assumption of homoseedastieity; the word “homoscedaxtic- 
ity” derives from two Greek words meaning "same" and" scatter" or 
“dispersion.") 

The ANOVA follows five steps: 


1. A linear model is postulated to explain the data. X„ r~> /t a- 3 + 

e u , where a, + • • . + *= 0 and the e„ arc all independent. 

2. The null hypothesis //, and an alternative hypothesis //, arc stated. 

1, I - "At least two a/s are different.” Or 

equivalently 

i* i = hj //, : v ip, _ a )« /. o. 

i 

3 ' k" in',',' ! h ,he decide, .ha, probability 

fhm — 'll " 1 ' h3 ' hc re i ccu "• »l>m i> » true (conclude 

SST "f 1 - Tlm ’• |c ' cl » typically .10. .05, .01. or .001. 
of faUc?v P " >b3bl > a "°“ '« S'cat a riak 

Si ^sfasx" 1 "* or * m “ ch ih - •<*>' •« 

4 ' ^S^Xmlcu’d^lTa^’n,' 3 ' ‘"’ d <b) d 'S rra 

are made. U) bclo.). and mean square, | P an(t>belo»] 


The data from the experiment may be laid out at folio.,: 
Treatments 


I 




J 



*.i *.♦ 


X.j 


ior,, 

"ritten in a form Sat is 


' By alEcbraic manipulations, SS. can be 
more efficient for computation: 
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Step 1. To calculate SS V , square each individual observation 
and sum these squared numbers. You then have 
j « 

ZIK- 

.-i 

Step 2. Find the sum of the original observations for each 
of the J columns. The sum of they'th column is 

£x„. 

Square each of the J column sums and add these . J 

squared numbers together. Divide tins sum by n. You 

now have the quantity 


Step 3. Find SS. by subtracting the result of Step 2 from the 
result of Step I. 

b. SS„ = £ »(*., - *..>’• Again by al S' braic ™" i P ulations 
shown that fml 


SS b = 




Step 1. Bring forward the value 

n 

that was calculated in Step 2 of (a). 

Step 2. Add together all of the Jn ' You 


“^^"SvMers -edsumby/m You 
square this sum. Divide me h 

now have the quantity ( J. Z X -<) / M 
Step 3. Find SS, by subtracting the result of Step 2 from 

result of Step 1- 


c. The 


degrees of freedom associated with SS K are Jin — 


it may be 
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d. The degrees of freedom associated with SS b arc J 1 . 

e. Calculate the A fS u and the MS b as follows: 


MS. 


ss„ 

J(n - 1) 


5. The /'-ratio MSJMS U is calculated and compared to the 100(1 — a) 
percentile point in the distribution /j-i.ju-iv 


The test of the null hypothesis is made by comparing the ratio MSJMS 
an F-ratio with J — 1 and/(n — 1) degrees of freedom, with a value obtained 
from an F-table that is the value exceeded "by a per cent of F-ratios obtained 
under conditions of a true null hypothesis. IF A/S*/ A/S. exceeds that value 
(denoted by then the researcher rejects // B as a true statement. 

To illustrate the above calculations, we shall return now to the example 
of Sec. 15.1 in which an experimenter sought to determine the relative 
effectiveness of four different levels of learner activity on learning about the 
English monetary system. Forty experimental subjects were randomly placed 
among the four levels (10 to each level) of learner activity: (1) outline the 
instructional material; (2) write a summary of the material; (3) study an 
instructional program of the material; (4) study the program on a teaching 
machine. Ten subjects studied English money under each of these conditions 
on five successive days; then a 100-item multiple-choice test on the English 
money system was given. A person’s “score” was the number of items 
answered correctly on the 100-item test. These scores appear in Table 15.1. 


TABLE |5.l SCORES ON A 100-ITEM TEST OVER THE ENGLISH MONETARY 
SYSTEM FOR 40 SUBJECTS STUDYING UNDER EOUR LEVELS 
OF LEARNER ACTIVITY 


2 4 

I Written 3 Teaching 

Outline summary Program machine 
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The calculations for obtaining SS b , MS b and MS„ are performed 
in Table 15.2. In the upper portion of Table 15.2, the data in Table 15.1 
are summarized by the necessary quantities. 

TABLE 1S.2 ILLUSTRATION OF CALCULATIONS FOR A ONE-FACTOR 


ANOVA WITH 

EQUAL n’s; DATA 

FROM TABLE 15.1 


1 

2 

3 

4 

n = 10 

n= 10 

n = 10 

n = 10 

X., = 48.40 

X.t = 48.00 

X.t = 57.50 

X. t = 63.60 

10 

-484 

10 

£*<* = 480 

10 

tx„- 575 

10 

^ Xu = 636 

10 

2 *?, =* 25,024 

i-i 

10 

J'Xt* - 24,392 

10 

2*,’> - 33,925 

1=1 

2^*, - 42,034 

i=i 



The null hypothesis H 0 , that f*i = = f* 3 = f*u is tested by referring 

F = MS b j MS a to the /"-distributions in Table E of Appendix A. The obtained 
value of F is located relative to various percentile points of the /"-distribution 
with 3 and 36 degrees of freedom to determine whether it could reasonably 
be regarded as an observation randomly sampled from that distribution — 
which it must be if // 0 is true. 

The value of the /'-ratio for the data in Tables 15.1 and 15.2 is F = 
570.69/149.93 = 3.81. In Table E we can find the percentile points in F 3 . 30 
and F a i0 but not in F 334 since not all /’-distributions are tabulated there. 
We can, however, interpolate between 30 and 40 to obtain approximate 
values of the percentile points in F 3 , M . The interpolation is performed with 
the reciprocals of the degrees of freedom instead of the degrees of freedom 
themselves, e.g., to find . 85 F SM one solves the following equation: 

30“ — Tg ,ts F I r3a 3 M 

5*j> — 4*0 jjFj jo — _ e5 F 3 40 
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The following percentile points are obtained by interpolation in Table E: 
= 1.43 
.mFj.ss = 2.25 

.ssF j.36 = 2.86 


F = 3.81 - 
Vt.-4.38 


obtained value of 
F = MSjAfS „ 


The single obtained value of F from the data in Table 15.1 lies between 
the 95th and the 99th percentiles in the F-distribution with 3 and 36 degrees of 
freedom. In fact it lies at about the 98th percentile in the F s „ distribution. 
Hence, if //,: /i, = wcr e true, an F-ratio as large as 3.81 or 

larger would occur with a probability approximately equal to 02 If we 
were testing //, at the « « .05 level in this instance, we would reject H 0 in 
£S v °f “ wh,cb ’• U , held that no, all four ^ are equal. If we adhered 
rigidly to the a _ .01 level of Significance we would not reject H,. We would 
be inclined to proceed as though H 0 were false. 


15.12 

THE ONE-WAY ANOVA 
WITH UNEQUAL n's 


££ in «. , „ r . 0 „ e . 
1 of the factor and 10 and 20 ™ pcr . ! ° Ils mi 8 ht be ob s"ved under level 
respectively. The technioue Sf a i b * observed under levels 2 and 3 
the^ preceding sections of'dds chante 7S ' S £ "T' * ' hi “ Was 
these "unequal fld«l,.' S m ° di,i ' d 10 “^mmodate 

modifications are made. The comn e,-!^ * S f SSC " tla1l >' the same; only slight 
those in the equal n’s ANOVA PU * 1003 technic i ues arc quite similar to 


Notation 


When „e said taw nmZml to‘”f„d h y- aLT" "1 gr ° UP ’ X "' 

S W* we l ‘"‘ w ,h “ 

numbers of scores, it will be new sm™ , ,llC f roups ma y contain dilfeient 

the first group as n,, the number of SCO 7 ° ^ the number °f scores in 

the number of scores in the y.h gro^n a " " econd S'oup as n, and 

't to be analyzed with a one-faaor £ ° 2ta E3,hercd *" a design that 
as follows: rTOOr " n «Pt>l fs ANOVA conld be depicted 
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Group 1 

Group 2 

Group 3 



XlA 


X,., 

X a3 



X la , 3 

*».x 




In the above example, group 1 contains 20 scores, group 2 contains 15 
scores, and group 3 contains 18 scores. Consequently, n 1 — 20, n a = 15, 
and ftj ~ 18. 

The Model 

The same assumptions about parent populations are made that were made 
for the equal n's ANOVA: the J samples are randomly drawn from normal 
populations with equal variances cr*. and they are independent. As before, 
it is assumed that the scores X if can be thought of in terms of the following 
linear model: 

X t) = ft + + e ( „ 


where X u is the /th score in the /th group, 

ft is the average of the J population means, 

a, is the difference between the mean of the /th population, p t , and 
ft, and 

e lf is the difference between X (f and ft,, the mean of the /th population. 
The subscript i runs from 1 to in the /th group. We shall denote the total 
number of scores in all groups by N ; of course, N = n a + n z + . . . 4- rij. 

A restriction placed on the equal n's ANOVA was that a t -f- ... -f- 
9-j = 0. In the unequal n's ANOVA this assumption becomes /?,«! -f n 2 x t + 

. , . -f- = 0. This is the only substantial alteration of the theoretical 

model, and you should not be greatly concerned. 

As before, the least-squares estimators of the treatment effects, x jt are 
obtained and become the basis of tests of the hypothesis that in the population 
all a, = 0, i.e., all 7 population means are equal. The least-squares estimator 
of <x, is X mi — X„, i.e., the mean of the n s scores in the Jth group minus the 
mean of all scores. 



- x„. 
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Again, the least-squares estimator of ft is simply 



The estimator of the error component. e tt , is given by 
e„-Arr,~X.,. 

The null hypothesis »e wish to test is that in the population all of the 
a, are equal, and hence equal zero. In symbols, the null hypothesis is as 
follows: 

//«: a, = 0 for all j. 

An equivalent statement of H 0 in terms of the J population means is 
W«‘. Pi * P* “ • • • “ Pj- 

Sums of Squares 

As before, the route to a statistical significance test of H 0 goes by way of the 
sum of squared estimates of the treatment effects , a, , and the sum of squared 
estimates of the errors, e (f . 

First, a sum of squares within, SS W , and a sum of squares between, 
SS bt are found. SS„ is simply a weighted sum of the J sample variances 
for each group: 

SS„ ~ («u - 1)J? + (n* - l)s| + . . . 4 - (n, - Us*, 

- !<x„ - +|(X„- ;?.,)> + . . . + 2(X„ - Xjf. 

A simpler notation for the above weighted sum is: 

ss.-lhx.,- X.,)'. 

4=11-1 

Using sigma notation, can be written 
SS B = 

In words, 

'■ C2C ^ , * !e N scores and add them up to obtain the first 

2 * 1” t ^ ie y £ h group, square this sum, and divide by n,. 

Uo this for allj groups. 

3. Sum the J quantities obtained in step 2. 


xl, ~ y— =i — ~ 
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Therefore 

E (_ss i .') = £(jrs.)-o 8 . 

\N-JI 

Whether H, is true or false, the long-run average value of MS„ (which 
would be obtained by averaging the AfS.'s tom a huge number of instances 
of running ihe same experiment) is equal to a z , the common variance o 
each of the J populations. 

The value of E(MS b ) is somewhat more difficult to derive, so we state 
without proof that the following is true: 

j 

y njfi, - nf 

E(MSJ = a 1 + ■ J ~- • 

The formula for £(AfS h ) for the unequal ns ANOVA is quite similar 
to £{AfS») for the equal n ' s case. [See Eq. (15.8).] They differ only in 
that the factor n, by which all squared deviations ft, — n are weighted in the 
equal n‘s case, becomes a differential weight n, for each squared deviation 
in the unequal n’s case. 

The essential property of AfS* is the same as when n’s were equal, and 
this property is best revealed by inspecting £(AfS t ). When H a is true, one 
expects A/S* and MS *, to be the same size, namely a\ When H 0 is false, 
one expects A/S* to be larger than A/S„,. These considerations, and others 
concerning distribution theory quite like that in the equal n’s case, provide 
the techniques for making the /"-test of H 0 . 

The /-test of H„: Rationale 
and Procedure 

When the null hypothesis is true the ratio MSJMS W will follow a central 
/-distribution with J — 1 and N — J degrees of freedom. When the null 
hypothesis is false, the ratio MSjAIS m will follow a noncentral /"-distribution, 
which has a larger mean than the central /-distribution and more of its area 
above the bulk of the area of />_,.. w (sec Fig. 15.2). 

The strategy by which If, is tested is no different from what it was in 
the equal n’s case. One adopts an a, finds the 100(1 — a) percentile point 
in the table of the /"-distribution which has t and N — J degrees of 
freedom, compares MSJ.MS, wiih j-l.s-j, and decides to accept or 
r ? CC, |/ - ^ exceeds , then it is considered unlikely 

that II, is true: for to do so forces one to regard the obtained /"-ratio as 
an unlikely event (m fact, an event with probability * or less or happening), 
would make better sense to regard this targe /-ratio as one of the typical 
3 ” oncen,ral ^‘vtnbution that would describe the distribution 
of SfSJMS m if //, were false. 
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An Example 


An experiment was performed in .hi* <*"£**■ "/7o 
to a social group on conforming e av tIiree gr0 ups: low simi- 

experimental subjects was d « 1 ' a med , hat their expressed opinions 
larity— subjects in this group e students in general; medium 

rvere generally at rariance ^uheiropinions agreed with those of college 

similarity-subjects were told th ^ similarity — subjects were 

students in general only m ° 1| y acl ,’ likc S I hose of students in general, 
told that their opinions were usua y J ^ exprcss (hcir opim0 ns 

Alter subjects were thus informed h y^^^ birth conlr „l, etc .). Before 

about 18 current issues (capital p to l d how students m 

expressing their opinion, however, .he^ubyCf ^ ^ of „ sub j ect . s 


general felt about each issue. The num oor t ra yed to be the opinion 
expressed opinion was the sanre as number of times the subject 

of students in general was laker, “ ^ nforfflity scores" for the 

"conformed to majority opinion- 
60 subjects are presented in Table 1JJ- 


TABLE ,,, CONFORMITY 'SSSSSS" 

^^tTInoenfral 10 





14 12 10 
17 14 12 10 
16 14 12 10 
15 14 12 10 
14 13 H 9 
14 13 11 


, , lhe medium similarity condition; 

Twelve Ss were observed ^u ^ ^ u]idef high similarity. The 

24 were observed under low the populations from which these 

null hypothesis of interest is drawn at random— these are hypo- 

samples can be considered to h k1s part icipating in this same expen- 

thetical populations of compa arc equal. That is, ■ w<- num 

ment-the means of the three , 0 high , then ff„: Pi = A. ft- 

the groups 1, 2, and 3 fr f sauares is performed in Table 5.. 

^The calculation of — “ S ’“ d 7 (S , are J - 1 = 2 = 

The degrees of freedom for MS,,* „ 10.208 and MS. - 

60 - 3 - 57, respectively- Hence, ^ as follows ,. 

304.167/57 = 5.336. The ANOVA 
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Source of 

variation: df 


Between groups 2 10.208 1.91 

Within groups 57 5.336 

To lest the null hypothesis li t that p % — p± = Pi |or equivalently that 
j a t _ 0 ^ refer an obtained F-rado of 1.91 to the F-distribution with 2 and 

57 degrees of freedom. The 75th and 90th percentiles in the distribution 
f 1S7 are 1.42 and 2.40, respectively. In replications of the experiment in 
Table 15.3, F-ratios exceeding 2.40 will be obtained 10% of the time when 
the null hypothesis is true. The obtained F-ratio of 1.91 will not allow 
confident rejection of the null hypothesis. 

TABLE IS.4 ILLUSTRATION Of CALCULATION OF SUMS OF SQUARES FOR 
A ONE-FACTOR ANOVA WITH UNEQUAL n’t; DATA FROM 
TABLE IS.3 

1 2 3 


n, “ 24 /», » 12 n, - 24 

2 X, - 333 2 X„ - 149 £ - 3® 

X m 2 •»» 

l-l *-l l-l <-l 

!S . - + 2jj? + S~ - ™ . 8780.833 - 8360.417 , 30.4,6 

15. *» S085 - pHL + + 1~TJ = 9085 - 8780.833 = 304.167 


15.13 

CONSEQUENCES OF FAILURE 
TO MEET THE ANOVA 
ASSUMPTIONS: THE 
• ROBUSTNESS" OF ANOVA 


The problem of what happens (to levels of significance and po*cr) when 
he assumptions underlying an analysis of variance model arc violated 
prevents considerable difficulty to the mathematical statistician. This is to 
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variances and normality. The actual significance Ie\e!s arc based on samples 
of 1000 /-ratios. Entire frequency distributions of the /-ratios obtained when 
/-tests arc performed on nonnormal populations or populations with unequal 
variances are reproduced in the I960 article by Boncau. These graphs arc 
illuminating and ought to be studied. We shall give only a few of the more 
easily reproduced findings from Boneau’s work. These results appear in 
Table 15.7. Try to integrate Boneau’s findings with those of Box and Pearson. 

In summary, the fixed effects ANOVA appears to be remarkably 
insensitive to departures from normality; and when ri s are equal, it is equally 
unalTected by heterogeneous variances. Box has used the w ord “robustness” 
for this insensitivity of a statistical test to violation of its assumptions. 


15.14 

TESTING HOMOGENEITY OF 
VARIANCES 


“.‘a' 1 " 8 of population variances 

nonulabn ■ JL ' (,) * h "’ ont " ish “ inferences about 

population variances because they ate of scientific interest, and (2) vshen 

SS2 f *“ in " a " a b' s, e Of variance in ssbich no. 

Ban * ctt s rest 1 has been* found *to 
come from normal populations (Bo"'^)' 5 ' K^cZ 'f 3 ' ?' mp, ' S 

sampled, the probability of a type 1 error frriri; , populations are 

test may be far greater than ihYz 2 3 ,rUC * ,,h Bartl ctt’s 

so sensitive to tfe assumptbn ofVo^^ ' ?“ l ^ ” 

good test of normality (Box I953)« SchefT- 0 ?^'^* * hat “ ma >’ e ' en ** a 
can be every bit as ; ieXrs 

sensitive to violation of the normalitv IT ' BanIetts - is much less 
A test devised by Hanley (see Bin™, J M ?P f UOn ’ howevcr (SchefK, 1959). 
is a useful short-cut test of // i. Te f Ies f or Statisticians, pp. 60-61) 

sample variance to the ° rrefcrrin S th ‘ ratio of the largest 

Biometrika Tobies). Hartl^s”,, ^^ ‘° 3 ^ ^blc (Table 31, 
observations provide about heteroeeneitT 1 ^ *- H ° f lhC lnforma,i °n the 

■sJa ss 
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that there is evidence that the tests of Bartlett, Hartley, and Cochran are 
sensitive to the assumption, upon which each rests, that the J samples are 
from normal distributions. The two latter tests also require special tables 
for their execution. 

Levene (1960) proposed a test of //„ that is both simple to run (it uses 
ordinary analysis of variance techniques) and insensitive to violation of the 
normality assumption in most cases. Levene’s test is simply a one-way 
analysis of variance on the absolute values of the differences between each 
observation and the mean of its group. For example, the data from an 
experiment have the following layout: 

Treatment 
1 2 
Xn x v . 


X„ x„ ... X.j 

One wishes to test the hypothesis //<>: a\ = . . . = crj. No assumption 
about the shapes of the distributions underlying the J samples is made. 
A statistic will be appropriate for testing H 0 on the above data provided 
that (I) one can approximate the 95th and 99th percentiles, say, in the 
distribution of the statistic when H 0 is true, and (2) the test based on the 
statistic has good power for rejecting H„ when it is false. The test statistic 
for Levene’s test is the ratio of MS b to A fS w of transformed scores Z f) that 
are related to the X fS ’s by Z ti — | X if — XJ. If this /"-ratio exceeds the 1 — a 
percentile point of the ordinary /'-distribution with degrees of freedom J — l 
and J(tt — 1), one concludes with approximately 1 — a confidence that the 
population variances are different. 

Levene’s paper (1960) reported on his investigation of the properties 
of the above test. Among other things, he studied the correspondence 
between the probability of obtaining a signitTcant /'-ratio on (he Z ff 's and 
the nominal probability adopted by the test user. That is, the correspondence 
between the sampling distribution of the /-ratio on the Z ( ,’s and the F- 
distribution was examined. It was anticipated that this correspondence 
would be close due to the widely recognized “robustness” of the fixed-effects 
analysis of variance. Since the fixed-effects analysis of variance is robust 
for equal numbers of observations per group, Levene restricted attention to 
this case. The robustness of Levene’s test has thus been examined only for 
equal «’s. The mathematics for Levene’s investigation proved to be in- 
tractable; thus empirical sampling distributions of /'-ratios calculated on 
the Z u ’s were compared with the /"-distribution with J~ 1 and J(n — 1) 
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degrees of freedom. In general, the agreement of the nominal (based on As 
/■-distribution) and the empirical (based on 1000 observations) probabilities 
of rejecting a true H a was remarkably dose. 

Uvene also investigated the power of his test for rejecting //«. The power 
was found to be generally quite satisfactory. Assuming normal distributions 
and / = 2, the efficiency of Lcvene’s test relative to the enact F-test was good. 
These efficiencies ranged between .75 and .96 for various values of « and 
various alternative hypotheses (<r*/tj|). For / samples the efficiency of 
Levene’s test relative to the approximate efficiency of Bartlett’s test, assuming 
normal distributions, ranged from .83 to -90. 

In summary, Levene’s test appears to be a "robust” test (for equal 
numbers of observations in /samples) of the hypothesis of equal population 
variances that has satisfactory power to reject alternative hypotheses. Its 
ease of calculation and use of standard tables of the /"-distribution make it 
an attractive alternative for more laborious and less robust tests that have 
been in common use. (See Glass, 1966tf.) 


15.15 

POWER OF THE F-TEST 


The power of a particular /-test depends on four quantities: the degrees of 
freedom "between” denoted by n„ the degrees of freedom “within" denoted 
by rt s , a quantity denoted by <f>, which is a measure of the degree of “falseness” 
of the null hypothesis, and a, the level of significance of the test. Of course, 
in the one-factor ANOVA the value of n, is / — 1 , and the value of n 2 is 
N ~ J. The power of the F-test is defined for a particular set of values of 
F i. • ■ • - Fj- The quantity has the following definition: 


fj 


i(Mi ~ /*. )* 


Jo* 

where p. = {ft, + - . . + ftj)jJ, which also equals ft. 
ii / hC c f ,culatlon of ^ involves a 2 , the population variance common to 
all / populations. Normally will not be known; this necessitates either 
gathering preliminary data to estimate it, making a shrewd guess as lo its 
™ T me3 T lnS the dilTerences amon g the ft, in o-units so that a* need 
not be known (e.g., what is the power of the F-test if ft, - & equals aj 2?). 

fmm tv£i ■£*! "‘1 3nd * 2rc knOWn ’ the P ower ofthe f “t«t can bedetermined 
performed fP end ‘ x A * For example, suppose that an experiment is 
S,™ = 3 lr ' a,mcms - Assume that there are ,= 11 

Er l"P , a " d «“• «* of Ae null hypothesis is to be 
* .05 level. We wish to calculate the power of the F-test 



EROBCEMS AND EXERCISES 377 


for the following alternative hypothesis: 

Pt — 68, ft s — 66, ft 3 — 64. 

Suppose that previous experiments on similar phenomena lead one to 
believe that cr* is reasonably close to 20. The value of 4> is 


‘-t 


11 [(68 - 66) g + (66 - 66)* + (64 - 66) 2 ] 
3-20 


V 3 • 20 V 60 v 


Table N is entered with «! = (/— 1) ~ 2, n 2 = (iV — /) = 30, a = .05, 
and = 1.21. The power of the F-test of H 0 against the alternative hypothesis 
that fti = 68, n 2 ** 66, and — 64 is approximately 0.40. Thus, there are 
four chances in 10 of rejecting H 0 in favor of H x if in fact = 68, /i 2 = 66, 
and p s = 64. 

Verify that for n t = / — I = I , n t ~ N — J = 60, « = .01 , and <f = 
5.00, the power of the F-test is approximately 0.94. 


PROBLEMS AND EXERCISES 

1 . Calculate the degrees of freedom for both MS b and MS U in each of the following 
instances: 

a. J - 2, n = 4. b. J ~ S,rt = 2. 

c. J = 3; n, « 3, n 2 = 6,n 3 =» 4. d. J = 3; = 4, n t = l,n 3 =* 5. 

2. Determine the critical value of F = MS b /MS v for testing H 0 in each of the 
following instances: 

a, / = 2; n = 6; « = .01. b. J = 5; n =* 7; <* =* .10. 
c. J - 3; « 4, n a = 6, /ij ■= 8; « *= .05. 

3. Which one or more of the following statements are equivalent to the alternative 
hypothesis in the one-factor ANOVA? 

a. Hi * t*i‘ for some j and j*. b. fi, & Mt * - • . * Hj-i & f‘j- 

c. 2 - &> 2 ^ °- d * f i t * p ** for alt pairs of - / and y*- 

j-1 

4. Given only the following data in an ANOVA table, determine AfS„, MS„, 
and F. 

Source of variation df SS MS F 

Between groups 4 81.25 

Within groups 

Total 49 378.60 
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5. The report of on experiment In which three treatment groups were compared 
contained only the following data: 

n „ 10 n, - 10 »J ” 10 

X.-KH.65 X,- 112.41 X, -105.06 

s, - 12.84 i, - 15.M s, - 16.55 

Using these summary statistics, calculate MS. and MS V and test //„: /'t - /'a !: ' 

/i a at the .01 level of significance. 

6. In Prob. 2 at the end of Chapter 14, a r-test of the significance of the difference 
between two means was called for on the following data; 

Group 1 Group 2 

rt 1 = 10 »* = 10 

X 1 = 19.20 X.2 = 11-30 

*\ = 7.93 s* - 5.79 

The value of i was 2.55, which was significant at the .05 level. Perform a 
one-way ANOVA with J = 2 on the above data and show that F = MSJMS* 
equals t 5 = (2.55)* which is also significant at the .05 level. Thus, verify for 
one particular case that the one-way ANOVA for two groups is equivalent to 
a t-test of ff 0 ‘ >‘i = Pt- 

7. Guthrie (.1967) studied the effectiveness of three different modes of training on 
deciphering cryptograms. Group I was trained by first being presented rules 
for deciphering cryptograms, then working examples. Group II worked ex- 
amples first then was told the rules. Group III worked only examples. A 
control group studied Russian vocabulary during the training period. A group 
of 72 subjects was randomly split into four groups of 18 each and assigned to 
the four training conditions. After training, the S' s were given a 10-item test 
comprising 10 new cryptograms like those studied during training. The means 
and standard deviations of the number of cryptograms solved on the criterion 
test in each group are as follows: 


Group 1 Group 11 Group 111 

( rule-example ) ( example-rule ) ( example ) Control 



= 2 - 37 Js = 2.31 s 3 =* 2.42 s, = 2.27 

Perform a one-way ANOVA on Guthrie’s data to test the null hypothesis that 
the means of the populations underlying the four groups are equal. Test H a 
at the .05 level of significance. (Save your calculations because they will be 
used again in the problems at the end of Chapter 1 6.) 


8- Three methods of teaching foreign language vocabulary were compared in an 
experiment. To evaluate the instruction, a SO-item vocabulary test was ‘ 
a mstered to the 24 students in the experiment ; eight students were in each 
fo^were K f . 0l ^ win S data-expressed as number of correct items out of 
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Aural-oral 

method 

Translation 

method 

Combined 

methods 

*n 

= 19 

Alt - 21 

*13 

= 17 

*21 

= 37 

*ee - 18 

*23 

= 20 

*31 

= 28 

Ajt = 18 

*33 

= 28 

*41 

= 31 

Are - 23 

*43 

= 30 

*81 

= 29 

Aie - 20 

*83 

= 13 

*.l 

*71 

= 25 
= 36 

*,= - 22 

X„ - 26 

*63 

*73 

= 18 
= 19 

*31 

= 33 

- 14 

*83 

= 23 


Perform an F-test of the null hypotnesis mai p x - « - 

significance. (Save yonr results for the exercises at the end of Chapter 17.) 

9 HarriDgton (1968) experimented with the sequencing of instructional material 
9. Harrmpon (t 1 p Iructimd , he material for the learner. A group of 

30 persons were randomly split into three groups of 10 each. Group I received 
P . . mitpriil before studying instructional materials on mathematics, 
organizing matena 1 Wore study, g , h< . ma , hematics; group 

?r P "■ TZ orca nSg material in connection with studying the 
mathmato instructional materials. On a 10-item test over the mathematics 
™“?he instructional materials, the following scores were earned. 

Group II Gr0U P 111 

{post-organizer) (wo organizer ) 


Group I 
( pre-organizer ) 


Perform a one-factor ANOVA; test the null hypothesis that - r. - ft ■' 
any level of significance you wish. 

Perform an fittest of the nul, hypothesis H . ^ - » - ° « «" “ 

on the following data which represent weigh, losses in pounds of subjects 
under four different diets. 

Diet A D‘etB Dtet C D‘etD_ 

*11 


*21 = 8 
*31= 3 
*41 = 5 
*81=« 


*13 ” 21 
X S3 = 20 
*33 = 17 
*43 - 16 


*14 — 3 

X u = 9 
*34 = 10 

*44 =7 

*44 “7 
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11 . The null hypothesis ?V- Pi 


,, s is to be tested with an * of .05 and with 
.amntesoE „ -S. Sunposethat unknown to the experimenter, H - 10, 
rsr-i - 1 "end - Id; also, the value of «* is 8.00. What 
is the power of the F - test of H , under these circumstances’ 

12. Ten samples of 20 scores each are drawn at random from a single normal 
population with mean /< and variance e*. The sample means of the 10 samples 
have a variance of 2.40, i.e., 

5 (*,-*>'• 

*.-**—$ 24 °- 

Find an estimate of «*, the variance of the original normal population sampled. 

13. Unknown to the experimenter, the value or <r* is 20, and /‘i ** 10, /< 2 = 15, and 
" it t = 20 in the three populations frorn which he has drawn samples of size 
n = 10. What is the expected value of MS b in this experiment, i.e., what 
would be the average value of MS b over an infinite number of replications of 
this same experiment? 

equal to its computational formula 


14. Prove that SS „ 


-i [i «<-•?.)’]« 


J « J n 

SS K =22 X i, - 2 — — ' (Hint: First show that j (*« - Xj? equals 

. “(JUf' 

, as was done in Chapter 5. Then sum the/ sums of squares 

for each group across /.) 

15. Prove that the value of F = MS b ltfS u in a one-way ANOVA with 1 = 2 
equals the square of the value of l for testing the significance of the difference 
between the two means, i.e., prove that 


equals 


tot + *l>/2](2//,) 

_ nu^ - xy + (tf s - 


(j! + i|)/2 




SM by noting that T - «’, + TJ/2, and then manipulate the numerator 
Of F into the form «( JP X ~ Xy[2. 


Now that you have completed this chapter on the analysis of variance for 

IncvX *?”’ modcl ’ y° u "V wi!h *° r53 d Stanley’s (1968u) 

Smmh’ arl ' de 10 c Vi ' w some t ’ < ’ i " ts alread y cov5red and » introduce 
certain other concepts that will be discussed in later chapters 
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than a simple one-factor ANOVA, both experimenters will end at precisely 
the same point, i.e., concluding that //„ is false with a high level of significance 
(low a), although the patterns of the differences between the population 
means in the two experiments are quite different. This is indeed an un- 
satisfactory situation. 


Most multiple comparison procedures are designed to be used after the 
null hypothesis of no treatment differences in an ANOVA has been rejected. 
The purpose of these procedures is the isolation of comparisons between 
means that are responsible for or contributed to the rejection of //„. For 
example, in the first experiment above, comparing fi l and /i t by forming 
/h 7 Pi = 20 - 20 « 0 reveals that these two means would not have given 
a high probability of rejecting H . if they were the only two treatment groups 
present. However, p, - H = 30 - 20 „ 10 was portly responsible for the 
significance of the obtained results. The application of mulliple comparison 

orobahilir t ° arS r , T kS “ lh ‘ 6rS1 above should have a high 

probability of leading the experimenter Co conclude that p„ «„ and n. are 
the same and p, is different from (and greater than) ail three of them. 

c ' actl °" 10 lhls problem is probably the thought that Wests 
P rformed on all possible pairs of means involved in ihe F-fcst will reveal 
where significant differences between means lie This s an u„ d 

expect the /-test to hi* v »hh r«, a ., ■ ■ . ls tc a d,, * e rent matter to 

between the smallest and largest^!™]!. 1 " 5 * C s ! 8nificance of the difference 
A West applied to the f and . P n m “" S a collrcti on of / means, 
how large/is. isn’t itclea^ that if wwa^/M be T' anS . ,ak ' “ "° aCCOU ' U ° f 
randomly from the same normal „ i - 0 be '0 an d vc draw 50 samples 
and smallest to be "significantlv difr^ f Mast wd * show the largest 
(a far larger proper, on hTn L a V" 6 ' pr °P° ni ™ o' - the time 
validity of btestingTZlllg a ln fi qU ''t “ )? s P de ° r •>« Patrol in- 
or multiple Westing in hen of thc'knaHs'is • ,n analysis of variance, 
been and continues to be used. ^ variance, this method has often 

shall deal with jbstTwoTfXmTnTnvri " ? r ° Ccdures available, but we 
atively recent addition to statisffcal th * ^T' . Such P rol *dures are a rel- 
> h ' 1550 ’s; their u sc in r ese, r S “ m havm i! be “ developed during 

■557 (see McHugh and Ellis 1955' staT^' ! sclEnces dates from about 
Kenyon, 1965). ’ ' ,55 ’ S,a " lE y. 1957a; Sparks, 1963; and 

<i952) extmsi ° n ° f 

lk Ir ' a 9‘ K = u| s procedure (see Winer i M , ! ' d ;,‘i ull,! appropriately, ihe 
IS leld was Duncan (1955) who nrod An Another early worker 
). Who produced D„„ea„’ s New Multiple 
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Range Test. Duncan’s procedure has probably been the most popular in 
the behavioral sciences and education; but until mathematical statisticians 
resolve their differences or satisfy one another that the derivation of the 
procedure is valid (see Schefle, 1959, p. 78, fn. 16), it might be wise for 
persons in applied fields to observe a moratorium on Duncan’s procedure. 

If a researcher wishes to compare several treatment groups with a control 
group to decide which treatments differ from the single control, the procedure 
due to Dunnett (1955) is available (see Winer, 1962, pp. 90-91). One of the 
few published reports of educational research in which Dunnett’s procedure 
was used is an article by Scannell and Marshall (1966). You should find 
this article informative as to the context in which the Dunnett procedure is 
used. The two most useful multiple comparison procedures are those 
developed by Tukey and Schefle: the /-method and the 5-method, respec- 
tively. Although the /-method and 5-method have very general forms, we 
shall deal with specific cases that encompass almost all of the applications of 
these methods to problems you are likely to meet. 


16.2 

THE T - METHOD 

The null hypothesis in a one-way ANOVA has been rejected at the a-fevel 
of significance. / treatments were compared , and the number of observations, 
n, in each group iro equal. All assumptions for running the ANOVA were 
met — at least no evidence existed that any one was violated. Among the 
J treatments there are /(/ — l)/2 pairs (i.e., combinations of/ means taken 
two at a time). Each of these yields a simple comparison of the form 
X } — X,j., where j j*. By observation of these /(/ — l)/2 differences 
between sample means and application of the /-method, the experimenter 
wishes to decide whether he can regard each p f — p,. as being different from 
zero. First we shall see how his decision is reached, then we shall discuss 
the probability values that apply to his decisions. 


Step 1. AH /(/ — l)/2 comparisons between sample means of 
the form X., — X.,. are computed. For example, if 
three treatments are_compared, then X.i — X. 2 , 
X.i ~ and ~ are calculated. 

Step 2. All comparisons X., — X are divided by V MSJn, 
where MS V is the mean-square within factor levels 
from the one-way ANOVA and n is the number of 
observations in any one group. 
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step 3. The 100(1 — a) percentile point in the Studentized 
range distribution with degrees of freedom J and 
J(n - 1) is found from Table F in Appendix A. This 
percentile point is denoted , -Jljjin-iy 

The Studentized range is the difference between 
the largest and the smallest means of J independent 
samples each of size tt from a normal population, 
divided by •jMSJn. There is a family of distributions 
of the Studentized range, since a different distribution 
results for all pairs of values of J and n . The two 
parameters used to identify a particular Studentized 
range distribution are /, the number of samples, and 
J(n — 1), the degrees of freedom for MS U . 

Step 4. A H J(J- ~ l)/2 differences A 1 ., — X,,. divided by 
V MSJn are compared with the percentile point. It is 
concluded that H mi and are significantly different, 
i.e., present eviden ce that p, and p,. are different, if 
\%j ~ X. t .\ over -JMSJn is greater than 

An example of the application of the T-method is as follows. An 
experiment comparing three methods (J =* 3), each having 1 1 observations 
per group {n « 11), yields X A = 22.60, = 23.40, = 28.50, aDd 

MS, — 4.10. An F-ratio significant at the .05 level is obtained. 

Step l. A\ v — = -0.&0. 

X.\ - X. 3 = -5.90. 

X.t~X. 3 = -5.10. 


Step 2. Dividing the above differences by v 
pves -1.311, -9.672, and -8.361. 


= .610, 


Step 3. From Table F we see that M = 3.49. 

Step 4. Any absolute difference between means divided by 
V AfS that exceeds 3.49 is significant. Thus, it can 
be concluded— on the basis of tbe T-method— that the 
population means for groups 1 and 2 do not differ 
(because 1. 3H < 3.49), and the mean of population 
3 differs from the means of both populations I and 2. 



16.3 

CONFIDENCE INTERVALS 
AROUND CONTRASTS BY 
THE T-METHOD 


Establishing confidence intervals around the differences X. f ~X ti . should be 
considered as important, or perhaps more important, than the mere decision 
of whether the difference is significant. Using the T-method, one can 
establish a set of simultaneous confidence intervals for the differences between 
sample means. The confidence interval around — X Jt is found from the 
following formula: 

{X., - Xj.) ± (16.1) 

In the example used to illustrate the significance testing function of the 
r-method, = 22.60, = 23.40, X., = 28.50, slMSJn= .610, and 

.»5?3.3o = 3.49. To establish confidence intervals around the three possible 
differences between means, one adds and subtracts (3.49)(.610) = Z13 from 
each difference. These calculations are performed in Table 16.1. 

TABLE 14.1 ESTABLISHING CONFIDENCE INTERVALS AROUND DIFFERENCES 
BETWEEN THREE SAMPLE MEANS USING THE T-METHOD 

Y) — Xjt MS„/n) Final calculations 

X.i ~ = -0.80 (3.49)(.610) = 2.13 -0.8 ± 2.13 »= (-2.93, 1.33) 

X i — X i ~ -5.90 2.13 -5.9 ± 2.13 = (-8.03, -3.77) 

- X, = -5.10 2.13 -5.1 ± 2.13 = (-7.23, -2.97) 


Notice that the single confidence interval that includes zero between its 
bounds corresponds to the nonsignificant difference between the sample 
means and X. 2 , whereas the other two differences are significant. 

Notice that the quantity added to and subtracted from the sample mean 
difference is the sa me for any value of j and j*. Thus, one determines 
MSjn and adds it to and subtracts it from each of the 
J(J — l)/2 mean differences to establish a set of simultaneous confidence 
intervals by the T-method. Of course, the purpose of any confidence interval 
of this type is to capture within its limits the value of ^ — n,.. The simul- 
taneous confidence intervals are constructed in such a way that the confidence 
coefficient for any single interval is not I — a, however. The meaning of the 
term simultaneous and the nature of the inference being made with such 
confidence intervals should become clearer in the following paragraph. 

In an experiment comparing five groups, there are 5(4)/2 = 10 com- 
parisons of sample means that can be made. Confidence intervals can be 
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established around all 10 of these sample mean comparisons by the 7*-mcthod 
using Eq. (16.1) and an a of .05, for example. In the set of 10 confidence 
intervals, some of them will include ftj — /i,. within their limits and some 
will not. The same experiment could be run again and again until thousands 
of sets of 10 confidence intervals are accumulated. The proportion of these 
thousands of experiments in which all 10 of the computed confidence intervals 
around the 10 comparisons Jfj — X mj . include within their limits the value 
of is equal to .95, i.e., (I - a). In other words, in 100(1 - a) 

percent of the experiments in which J groups of n each arc compared, the 
T-method will yield /(/— l)/2 confidence intervals all of which include 
Pi ~ Pf within their limits. In some experiments, perhaps only one of the 
J(J- l)/2 confidence intervals docs not capture ,i t - in other experi- 
ments, more than one or the set of simultaneous confidence intervals will 
notcapture/i, - in the long run, only 100(a) percent of the experiments 
will have one or more of the J(J - J)/2 confidence intervals that do not 
capture /i, — p,,. 


Dartre f om .h *• ,° f s,n \ ullaneous confidence intervals is a radical de- 

r'p Z e ,mP "rr. S orconf,dc "“ ™«™l* ">« earlier in .his 
, r. P °, ° f ,hc l hat a confidence interval of 

Suppose that J -—7, proum nf « — m ’ 
some variable, and that 13 « = tn P? rSOns arc ^mg compared on 
comparisons of means that can b^made fnll^ ^ ~u ThCrC 3rC 3(2)12 = 3 
that r-me.hod confidence inUrv^sThh a i m® 
three comparisons A* j — ^ p _p -10 are established around the 
process of drawing three Soup’s or m Z g *7*~ Conceivably, the 
method intervals around the comn - setvations and establishing T- 
The graph i„ Fig . to."!!, “oXhTe" 5 7“ * K ^ d W 

The small dots represent r ' Sults ° r lhis Prtwss. 

1 on the horizontal scale in Fisr. lffcrences - The three dots above 

means obtained the fi m tim5 .f, Pr ' S ' nt ,bc lh ™ differences belvteen 

nnequaldtshmceabZand betj ST *“ ThC “”“•»« 

by the r.method. Notice that i„ ft, u ’‘''T 55 ™ ‘he interva's established 
intervals include the true value or ,, ” eX ^ rimc,n 211 three eoufideuee 

ttue with 90% = 100(1 — x ) of thee* W,! should expect this to be 

that in Urn fifth T ^ = 3 ’ " = P“f°™e d - 

he population mean differences We ° f m' lhr " '" lcrvals d ° not cover 
• w = “ould expect .hat one. five, or all 
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FIG. 16.1 Sets of simultaneous confidence intervals constructed by the 
F-method for 10 replications of an experiment comparing three treat- 
ments. 


three of the intervals would fail to capture the population mean differences 
in about 10% = 100(a) of the experiments. 

One weakness seems to exist in the adoption of an “experiment-wise” 
error rate when making multiple comparisons. This weakness relates to 
the concept of an experiment itself. Generally in research in education and 
the social sciences, the choice of the number of levels that will constitute a 
factor in an experiment is arbitrary. Seldom do compelling reasons exist 
for making this choice. The experimenter would willingly include any number 
of additional levels in the experiment for comparison if they presented 
themselves on his doorstep. Some experimenters choose to include a “control 
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group" in any experiment, whereas others do not. The definition ofa “factor" 
(a collection of levels to be compared) in an experiment is quite arbitrary. 
However, a contra,, between method A and method B always means the 
same thing whether ,t stands alone or is imbedded in a factor with a dozen 
levels. (See Wilson, 1962 and Ryan, 1959 and 1962.) 

Inspection of the Tukey method shows that the probability of detectine 

•JXfZ numb St m “" S <to '— ' ° f the 

upon J, the number of groups compared in the experiment. The width of 

the “experiment?” groups one happens to include in 

Although some case can be made for th«. „ 
or comparison instead of the experiment « h . P mmence of a contrast 

of errors of the first kind, the sfatistical Uchn'in!'™' f ° r COI ’ ,I<>,Iln S ‘ho rate 
as achieving this purpose— most notable th S “ m0S1 CDmmonl )’ regarded 
Winer, 1962, pp. SMIV-are no, w ^ ! ^Wnaa-Kouls procedure (see 
the Newman-Keuls proredm* d^s nor testlh^s'^'r h "> “» ltat 
between means separately for all pairs of gn ! fi l cance of th e difference 
rate of x as is widely believed. P f means with a contrast-wise error 


16.4 

THE 5-METHOD 


~ b ' whenever th, 

nemos, eomp r ebe„sivedis.ssio„ of, hTre,r;t^T„t^ 
,n ‘ 


The constants e u .. m 

2t «*-*ozero. Thus*™’ , *-r — 

’ * ~ f*i + [i t - 2/i, j ; 


■he, <um to z^T-ST \', Cj Positive and negative real numbers 


ts a contrast in the three means 
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, „ . „ ndc __2. The difference between 

3 . . j -ii /■’« pnual to U. ..... 


anviwomwiowavv / ,.n 

to c. = 1 , c, = - 1 , and all other c s equal to 0. 
Suppose there are five population means /*, 


TABLE 16.1 


Table 16.2 lists 


1. #*!-/*■ _J 1 0 * 1 

2. /<»- [tjil + .... 1 —J 1 J A 

3. 10'. + »> + /‘‘M “ t0 “ + /4,)/ -1 0 1 0 ° 

4. /i, - /*! ^ ' 

contrasts among the five me® W ' 
replacing p's with sample means, . /|6.2) 

VI = c,JP., + <■*•?. + • ■ ' + 

For example, if V = Pt - can be judged, it is 

Before the significance of any estimate AN OVA has b „ 

' / o .3 Cm f 16.31 


4 = AfS.(^ + ^+ ••• + „')' 


where A/5„ is the “mean-square witb , mean, and 

The degrees of freedom «T ) ^ "bservations in 

£ ZSZJEZ ANOVA thlt pmeedes the S-method, , equa t 
i _u „ , * _ r _ jp an estimate 

"* + For' etample. if / - ♦. * * 1- * ' ' 

of the variance of y is , _ . 

..o l L\ 


, rii + el + ?i+^]-- MS -(ro)' 
■-Lio 10 10 10 „ . 


“V Lio 10 to . 40-4 = 

The degrees of freedom for this estimate of variance are 

zero. This test answers questions Utce tn 
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’’ I*"?"' difference between overt and covert responding to a 
teaching program? fi l ~ p t y 6 

2 ' 'ffcsnJ, V „7. a ,ftr° r T" res P°" di "g ™h and without conlirmaUon 

Kf, + +TJ/2] Same aVms<: f ° r c ° vert re! P ondin S ? 

experiment dltTer^ ^r three levels of positive reinforcement in an 
reinforcement? XX „ + 

amongmeans: ^ “ f °"°” *" Ks,i "6 *k« significance of a contrast 

Step 1 . Specify and estimate y The coeffini™, 

-nr^^tu$ Mn,p,e 

s,epJ - nSrz&Tr?* Th — r — 

nontains a This vata” ThX-sXllX'"''*? 

sires are substituted into Eq. ,16.3) to produce 

Step 3. Find a*. Take th» 

was found in step X ^ r °°‘ ° ! 4 which 

Step 4. Form the ratio of y to i- Th , 

-P ■ ‘-irtded^.h^t^^ 

statistic!' The absolute value of th' ra, -° ™ h ,h = lest 
4 IS compared with the ra,l ° found in ste P 

■ h ' ml - a) SSEttfX- » kntes 

degrees of freedom j_ , lh ' ^-distribution with 
' Lihsohne^ 0 ^J "d N-t. That is, 

VTTT - _ ” ,s enmpared with 
rejected ,f , he i^e SL XlX 5 V = 0 is 

square root of (J _ n ,• [ lhc ratio exceeds the 

/ times the percentile point, i e 

reject //.: y „ 0 if ly| 

i;. ^ ~ 1 >i^-,..v-a. 

Xrtm e m™;“^q“dXXXamd, 0 "“°"'raduedon method of 
testj obtained in the ««- 
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Group Sample mean n 


! X ml = 10.32 tti = 20 

2 X. t = 10.54 n t = 25 A IS„ = 8.35 

3 JP, => 12.86 «, = 20 

4 .?< = 7.17 n t = 15 


The F-ratio in the analysis of variance of these data is significant at 
the .05 level. It is now the purpose of the ^-method lo uncover the groups 
contributing to this significant result. 

Very likely, all possible pairs of differences between the sample means 
will be of interest. Thus, the significance of the following differences between 
sample means must be determined: 

, - x., x., - .r , 
x, - x, - x A 
x., - x, x A - x, 

Suppose in addition that a slightly more complicated contrast is of 
interest to the experimenter. It was the case that groups 1 and 2 were taught 
by two methods differing only in that group 1 actually observed the pre- 


TABLE 16.4 


Estimate of variance 

Contrast Estimate of contrast of contrast 


A. hi -fit 

Ki ~ Kt 


B. Hi ~ Pi 

X A - JP., 

+ -) 

V"i «•/ 

C. pi Hi 

Zt-x, 

MsJ- + 

\«1 »lj 

D. Ht - f‘* 

X.t - 

AL9„(- + — ) 

>t,/ 

E. Hx — 

x. t -x A 

MS W (- + — ) 

\ n « ««/ 

F. Hi ~ P* 

a'.i - Jr.. 

A/S.Y- + -) 


X.i + X.i „ 

2 ' 

ils .(l + i + A) 

\n, «, n,f 
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cipitation of silver chloride while group 2 did not. The data suggest that this 
experience for group 1 was irrelevant; in fact, groups 1 and 2 may not be 
different. The experimenter wishes to know whether the average of groups 
1 and 2 differs significantly from group 3, which was taught by a substantially 
different method. This question suggests the contrast — fi r 

The contrasts in which the experimenter is interested appear in Table 
16.4. These specifications complete steps 1 and 2 of the five steps to be 
followed, if we substitute the data into the formulas in Table 16.4 the 
estimates (see Table 16.5) of the contrasts and the variances of the contrasts 
result 


Contrail y <J* 


A. /i, - ti t 

- 0.22 

0.752 

B. Mi~Mt 

-2.54 

0.835 

C. Mi - Mi 

3.15 

0,969 

D. fit ~ ti. 

-2.32 

0.752 

E - Mi- Mi 

3.37 

0.885 

F. M» - Mi 

5.69 

0.969 

G. UMi+MiW-Mi 

-2.43 

0.60S 


As one example of how the calculations in Table 16.5 were made, 
consider contrast G. The contrast is v « [(/r, + /r t )/2J - the estimate 
y of the contrast (step 1) is 

. X i 4- X t _ 10.32 4- 10.54 

v 2 * 2 12 86 = ~ 2 - 43 ' 

The estimate of the variance of this contrast (step 2) is 

6i ” ms -(j7 1 + ■£, + r) = * Js f* + »** + *) - o.6i. 

Step 3. The positive square roots or the estimates of the 
variance of all six contrasts are now found: 


*1 H 


°-752 0.667 

0.835 0.9 1 4 

W 0.985 

0.752 0 667 

0.885 0.941 

0.985 

OMi 0.778 


Contrail 
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Step 4. The ratio of y to is found for each of the six 
contrasts. 


Contrast 


A 

-0.22/0.867 = -0.25 

B 

-2.54/0.914 = -2.76 

C 

3.15/0.985 *= 3.21 

D 

-2.32/0.867 *= -2.67 

E 

3.37/0.941 = 3.59 

F 

5.69/0.985 = 5.81 

G 

-2.43/0.778 = —3.12 


Step 5. The absolute values of the ratios found in step 4 are 
compared with -J(j — " l)i- a F J _ li!l ,_j. 

N =* 20 + 25 + 20 + 15 = 80 and J = 4. M T 3 . 7g = 2.72 


•/(/- = 


Therefore, if any ratio found in step 4 is greater than 2.86 in absolute 
value, i.e,, if the ratio is above 2.86 or below —2.86, the corresponding 
contrast is significant. By this criterion, contrasts C, E, F, and G are signifi- 
cantly different from zero by the 5-method with a = .05. Thus the experi- 
menter concludes that 

ft - ft. /<* - ft. ft - ft. and 1 - ft 

are not zero. It cannot be concluded that fi 3 and (i 2 differ, or that fi 2 and 
differ, or that \i z and fi 3 ditfer. For a published application of the 5-method 
to educational psychological research, see Travers et al. (1964). 


16.5 

CONFIDENCE INTERVALS 
AROUND CONTRASTS BY 
THE S-METHOD 

The confidence interval around the estimate of a contrast is constructed as 
follows with the 5-method: 

V ± l)i -.Fj-ux-j- (16.4) 

Of course, the purpose of this confidence interval is to capture between its 
extremes the true value of the contrast in the population means, tp. 
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TABLE 


^ V — c iPi + c sMi + • • • + Cjfij, then the confidence interval for rp 
around $ is 

( C l^.l + C 2%.2 + . . . + CjXj) 

± 


+ £i 4 . . 

..+% 

\ni n 2 



/V - Oi-.f j- 


Suppose, for example, that ip = 
V would be 


' Pi ~ Pi- 


(16.5) 

The confidence interval around 


- PJ ~ ‘k-Wj. 

has I‘ he , of lh<: dilr "™« of the estimated contrasts from zero 

So ,e con , -fr 0f Et| - ( . l6 ' 4> for *“'i confidence intervals is 

ll noTbe i I nst'ra, 5 ' “f ' rUCU0 ? ° f ,hc — fi° d confidence intervals 
win now be illustrated on the sample used in Sec. 16 4 

intervalToundTS! “ f 6 "'"™"* ditr '™ f™» »» if the confidence 
T r ( ” “ d “ s s P a " “CO- Hence, contrasts C, E, E and C in 

Table 16.6 correspond to contrasts that differ significantly’ from “ro In 

6 4 ' L i USTRAT '° N ° F THE CONSTRUCTION OF S-METHOD 
CONFIDENCE INTERVALS BY EQ. (| 6 .4) ° 

$ ± 6;Vv- I) » Vviorw - V ± ffj(2.86) 


-0.22 
A -2.54 

C. 3.15 

D. -2.32 
£■ 3.37 


0.867 

0.914 

0.985 

0.867 

0.941 

0.985 

0.778 


(-2.71,2.27) 

(-5.17,0.09) 

(0 35, 5.95) 
(-4.80, 0.16) 

(0 68, 6.06) 
(2.89, 8.49) 
(-4.66, -0.20) 


the esse of contract f' n 

™ lhod «-« the “" rid ™“ « *>y U» * 

zero, indeed it probably lies between 035 an S oV°c- e number grater than 
interval on contrast G tells us that the'dilS 5 ‘ 95 / S,milarl y. ‘he confidence 
of the population means , h and „ is f ?f lwecn P> and (he average 

hand, the -S-mcthod g J s ^ no eVidc^ F***? "" Mr °* 0n thc othcr 
and /<, differ (contrast A) ° concdud e with confidence that 

«- —hod must be regarded 
S r '. ', by ,he T-mcthod, no wobah ; ' *** lhc Ca “ intervals 

ndmdual .ntcrvaL On t hc conlra P « made about each 

ty, the intervals are constructed in such a 
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way that the entire set of those which can be constructed in any one experi- 
ment has a probability of 1 — a of capturing the true value of the contrasts 
estimated within its bounds. In other words, if all possible contrasts among 
a set of means are formed by the 5-method with a confidence coefficient of 
1 — a, the statement that “all of these 5-method confidence intervals 
contain the true value of the contrasts they estimate” will be true 100(1 — a) 
percent of the times this experiment is run. 


16.6 

THE T- AND 5-METHODS 

COMPARED 

The Scheffe method of multiple comparisons is generally regarded by 
mathematicians as superior to the T-method because of its generality (equal 
n’s are not necessary) and greater sensitivity when complex combinations of 
the sample means are being estimated. However, you will probably find 
greater use for the T-method. 

An obvious criterion for the choice between use of the T-method or 
the 5-method is the distribution of N among the J groups. A requirement 
of the T-method, which is not made with the 5-method, is that the sample 
sizes are equal. Hence the T-method cannot be used when not all n's are 
equal; the 5-method must be used with unequal n’s. This dictum must be 
tempered with good sense. We shall see presently that if one is interested 
in only the differences among pairs of J sample means, the T-method is far 
superior — in terms of power to detect significant differences and shorter 
confidence intervals— to the 5-method. If the n's are only slightly unequal, 
e.g., m —21, n 2 = 22, and n 3 — 20, it is wiser to discard at random one 
person from group 1 and two persons from group 2 and apply the T-method 
than to apply the 5-method. The random exclusion of three persons from 
the analysis will have a negligible effect on the results, and it allows you to 
use the more powerful T-method for comparing sample means. On the 
other hand, this strategy would not work well if, for example, n } — 5, 
n 2 = 30, and n 3 = 40 because too much data would have to be discarded 
to achieve equal n's of 5. In this instance it would be better to use the 5- 
method even if you were interested only in the three possible differences 
between sample means. 

Suppose that after finding a significant F - ratio in an analysis of variance 
with equal n’s, the experimenter was interested in probing among the 
j(j _ i)/2 differences between sample means. Should he use the T-method 
or the 5-method ? Either can be applied, but which one is preferred"! The 
T-method is preferred because it produces a greater number of significant 
differences between means. Equivalently, the T-method will give shorter 
confidence intervals around differences between means than the S-mcthod. 




FIG. 16 2 Schema for making mulliple comparisons among J group 
means. (Based on Hopkins and Chadbourn, 1967.) 

As an example of the relative widths of the confidence intervals around 
given by the T-method and the 5-method, take the contrast 
between X, t and X A in Table 16.1. The .95 confidence interval around 
X.x — X A was found to be (—7.23, —2.97). Now let’s establish a .95 
confidence interval around the same difference using the 5-method. 

£•-&*=* -5.1, AfS w = 4.1, n t — rt}= II. 

o; = 4.1(A + - 1*0 = 0.745, V(j - l).„j\r-i..w = V2( E») = 2.58. 
The S-mcthod confidence interval 3round X, — J?, is —5.1 ± (2.58)(.86) « 
(-7.32, -2.88). 


Itt 
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This confidence interval is wider, but not by much, than the confidence 
interval around the same difference between means yielded by the T-method 
(—7.23, —2.97). In some instances, the S'- method will produce a confidence 
interval that includes zero — in which case the difference between the means 
is judged nonsignificant — while the T-method will produce a confidence 
interval around the same difference that does not include zero, thus leading 
to judgment of a significant difference between the two means. 

When one is interested in contrasts that are more complex than a simple 
difference between means, e.g., [0^ + ftJ/2] ~ fi 3 , then the S-method has 
more power (and thus yields shorter confidence intervals around an estimate 
of the contrast) than an extension of the T-method appropriate for such 
contrasts. We have not discussed the extension of the T-method to contrasts 
other than p t — n s . because it would only be a needless complication in 
view of the fact that we have developed the S-method, which is preferable 
with the more complex contrasts. You can find a readable account of the 
more general T-method in Guenther (1964, pp. 54-57) and a mathematical 
discussion of the method in Scheffe (1959, pp. 73-77). 

The application of the T-method (or the .S-method, when n's differ 
appreciably) to the differences between sample means accounts for almost 
all of the applications of multiple comparison procedures in educational 
research and the behavioral sciences. Interest in more complex contrasts — 
for which the S-method would be more appropriate-is not common. 
Undoubtedly, however, many opportunities for exploring interesting and 
worthwhile contrasts with the Y-method are overlooked. 

A recent textbook on simultaneous statistical inference by Miller (1966) 
is an excellent reference on these and related topics. 


PROBLEMS AND EXERCISES 

3. From Table F in Appendix A, determine the following percentile points: 
a - .ftiflr.M b. ss^a.int c ■ d. . m 0j 2 ,b e. 

2. Suppose that five samples each of size 7 are to be randomly drawn from a single 
normal distribution with mean /* and variance a 2 . The range between the 
largest, X L , and smallest, X s , sample mean will be found and divided by 
VMSJ7, where MS„ is the average of the five sample variances. 

a. What is the probability that (X L - X m )}^MSJ7 will be larger than 4.102? 

b. If this sampling procedure were carried out thousands of times, what percent 
of the values of (X L — X s )( Y MSJ1 would exceed 5.048 ? 

3. If (X f — JPj.) ± . 9 sflj.jin~u ^ MSjtt does not contain zero, then the difference 
between Xj and X is judged to be significant at the .05 level by the Tukey 
method. This procedure is equivalent to concluding that X , and Xj» differ 
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significantly at the .05 lesel if 


\X,-XjA 




Suppose that in a particular experiment, J = 6 , n => 11, and MS„ = 44.00. 
Which of the following means differ significantly from each other at the a = .05 
level by the T-method: X A « 61.25, X t => 64.72, = 70.57, X. - 73 42, 

fs = 81.66, and X, « 82.17? 

4 . In Prob. 7 at the end of Chapter 15, you performed an ANOVA on data gathered 
by Guthrie (1967) in an experiment on “discos ery learning.” Using the 

mul |‘ p ' e »“ >11 pairs of means in Guthrie's 

experiment using an a of .05. 

5 ' SL,” d rrL a r 1S ° 967 l COm P s,ed phosslss Hotter) and loot-say (word) 
° reading instruction on a measnre o( transfer of training. Twenty 

pup.M«r.n? ° mCd ’ 1 >rei 8 h ‘ “° rds ^ «* pironic method and 20 
pupns learned the same words bv the lonL-cav a 

Number of trials to learn second list 
Phonic Look-say 

me,hod method Control 


20 

13.50 

9.94 


20 


20 

27.20 29.25 

__ “ 10.37 o 90 

The value of MS m was 107 os p.j 

using the Tulcey method, ^rmine^ich ^" ? m P arisons at lhe -01 level 
6 . Four ™, h ods of machrn. Pans of mean, differ significantly, 

method, unitary analjsif TOS’ e f 0 < ““"“ lh ? d ' f( >"™la method, equation 
1963). Twenty-eight sixth-grade h Padm were compared (Sparks, 

methods; seven classes studied under each ™ I y''f oml y assigned to the four 
teaching unit a 45-item test on computing A ' ' hc “ , " cl, “ i °n of the 

elass and the Cass asetage recorded ^ 

■drcrqge test score for each class 

rn,a ““'M-j.r 

*"‘*'0) m„W(4) 


Case 

method ( 1 ) 


method (2) 


14.59 

23.44 

25.43 

18.15 

20.82 

14.06 

14.26 


20.27 

26.84 

14.71 

22.34 

19.49 

24.92 

20.20 


27.82 

24.92 

28.68 

23.32 

32.85 

33.90 

23.42 


33.16 

26.93 

30.43 

36.43 
37.04 
29.76 
33.88 
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a. Run a one-way analysis of variance on these data. Test the null hypothesis 
of no differences among the four teaching methods at the .05 level of signifi- 
cance. 

b. After completing the F-test, use both the F-method and the ^-method to 
determine which of the six pairs of means can be considered to be significantly 
different. Which method produces a greater number of significant differences? 

c. Set up the 95% confidence interval for [(/<, + // 3 )/ 2] - [(//j + ^ 4 )/2] using 
the i’-mefhod. 

7. Klausmeier (1963) studied the effects of accelerating bright older pupils from the 
second grade to the fourth grade, thus skipping the third grade. The following 
five groups of n = 20 pupils each were tested on the Metropolitan Achievement 
Test during their year in the fifth grade; 

1 . Acc : a group of 20 bright pupils who were promoted to the fourth grade from 
the second grade after a five-week special summer session. The Acc group is 
now in the fifth grade. Average age of this group js 9 years 6 months. 

2. SY: a group of 20 pupils who did not skip the third grade and who are of 
superior (5) ability and are young ( Y), i.e., below median age of fifth graders. 
The SY group is now in the fifth grade and has an average age of 10 years 
0 months. 

3. SO: 20 nonaccelerated pupils of superior (S) ability and who are older (O) 
than the median fifth grader. The SO group, now in the fifth grade, has an 
average age of 10 years 5 months. 

4. A Y: 20 nonaccelerated pupils of average ability who are younger than the 
median fifth grader. The A Y group, now in the fifth grade, has an average 
age of 10 years, 0 months. 

5. AO: 20 nonaccelerated fifth graders of average (A) ability who are above the 
age of the median fifth grader (average age, 10 years, 6 months). 

The following table presents data for the Total Language scores on the Metro- 
politan Achievement Test for the five groups: 

Group 

Acc SY SO AY AO 


n 20 20 20 20 20 

X 55.15 56.40 63.55 47.15 53.45 

s x 6.86 4.59 6.86 7.75 12.39 

a. Perform an F-test at the .05 level of ff 0 : //j = ...=* fi s on the above data. 

b. Follow the F-test in (a) with F-method multiple comparisons to determine 
which pairs of means differ significantly. 

c. Use the ^'-method with « *= .05 to place a confidence interval on y =» 
Vi — lG“s + . . - + /< 5 )/4]- In effect, then, test the hypolhesis that //„ the 
mean for accelerated pupils, differs from the average of the remaining four 
means. 



17 


THE TWO-FACTOR 
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FIXED EFFECTS 


17.1 

the layout and 
symbolization OF data 
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Factor >4, 
teaching 
method 

Factor 0, sex 

Male (1) 

Female (2) 


*111 

*121 


-^112 

*122 


*113 

*123 


* l|4 

*124 



*221 


*212 

*222 

2 

*213 

*223 


*214 

*224 


*311 

*321 


*312 

*322 

3 

*313 

*323 


*314 

*324 


FIG. I7.J Layout of data in a 3 x 2 two-factor ANOVA design with 
four observations per cell (Af»« notation, where / = 1, 2, 3 for method, 
j ^ 1,2 for sex, and k — 1, 2, 3, 4 for pupil within method-sex group). 


by X. Observations on X are taken by administering a standardized test of 
reading comprehension. If four boys and four girls were taught to read by 
method 1, four boys and four girls by method 2, etc., the data could be 
tabulated and symbolized as shown in Fig. 17.1. 

A total of 24 pupils participated in this experiment. X ni represents the 
reading comprehension test score of the “first”— arbitrarily designated— boy 
who studied under method 1. X 3U stands for the test score of the “ fourth ” 
girl who studied under method 3. The “3” stands for the method, the “2” 
for the sex (male— 1, female— 2), and the “4” designates the arbitrarily 
labeled fourth pupil in the group of four girls under method 3. 

In general, an observation in a two-factor ANOVA design is denoted 
byX m , where its a subscript for factor A and takes on the values 1,2,...,/; 
j is the subscript for factor B and takes on the values 1,2,..., /; and k is 
the subscript that identifies the observations within a cell (combination of 
levels of factors A and B) of the design and takes on the values 1, 2 
If we want to denote the score of the third (3) girl (2) studying under the first 
(1) method, we let i = 1, j = 2, and k = 3 to obtain X lt3 . Often with 
statistical notation one in effect reads subscripts or summation signs from 
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right to left. To summarize, in X fSk , 

i— 1» 2, . . . , / for factor A, 

J — 1, 2 , . . . ,/for factor B, and 
* = 1, 2, . . . , n for “within cells.” 

Jtei! 1 , s !t l S!;?/ rora ‘ he 


Chanters is arir i v c ,rom ine conventions established in 

observations within^a" rlu Pf 1CU,ar '. thc »ght-most subscript now indexes 
cell observations Mau ’ P rev,ousl y thc left index ranged over within- 

design /= 1,2, „ »»•••»«» whereas for the one-factor 


17.2 

A MODEL FOR THE DATA 


varies with the fevels™ 8 tte' ^ 30 “'° . dEte ™'ne how the size of the scores 
gMs, whether method 2 dveTwZ^'" W ? h£r b °> s score than 

this end. we shall now d^a fait I, ' S mtlhod >■ etc. Toward 
“sed With the one-factor ANOVA-io , m .° de ~ a generalization of that 

ate related to factors at and ft. 10 c *P lam - "> a general way, how the data 

one that involved 3 '."£ factor ANOVA would be 

or each datum, which des^Se . '™ ? a M, the same 

foe each level of factor A, which dSCibe T ' h ' SC ° rES; 1 ,crms “t. one 
below ,, for those ^ b “ ' «“ excess of the data above or 
eve of facto, B, which de ??,;;;' 1 ? f' 1 " d ;7 terms ft, one for each 
a the jth level of factor B ; and a tem ° V , " be,ow " for lh = scores 
essentially mak e up ,h c difference bettUn ,h."' f ° r ' al:h SCDre lhat would 
terms in the model. u„d 5r , hi _, d - lht SK,re a nd the sum of the other 
represented as follows: rod, men, a^, model, a score X m would be 

=•(• + » , + B, + e 

‘ T "*• (17.1) 


tb ™ foghnet tendf m a'ddTl'i'" S ' nc ' al , 'nd to be S’T = ^ if reedi 
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« n + a , + * , 
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(17.1). The more useful and descriptive model differs from Eq. (17.1) by a 
term that denotes the effect of the unique result of combining level i of 
factor A with level j of factor B. “Unique” in this context means that the 
outcome one would obtain from combining level i of A with level j of B may 
not be the simple sum of a. f and fa. If not, perhaps a new a fa jf which is not 
the product of a ( and fa, is needed to describe the scores in the yth cell. 
Such a term is called an interaction term. [The concept of the interaction of 
two independent variables is due to the famed English statistician-geneticist, 
Ronald Fisher. It was Fisher’s concepts of experimental control through 
randomization and the study of the effects of several factors and their 
interactions simultaneously that successfully overthrew the “one variable at a 
time” orthodoxy of experimental agriculture in the early 1900’s. See Fisher 
(1925) and Stanley (1966b).] If an interaction term is needed in the model to 
describe the scores, then knowing the general effect of level i of A and of 
level/ of B is not enough to predict the outcome in cell ij. An interaction is 
tenuously analogous to two polarized lenses: light passes through each Jens 
individually, but when one is laid on top of the other (when they are combined) 
no light passes through the pair. The separate effects of the lenses differ from 
their combined effect. 

The expanded model, the model upon which the analyses in this chapter 
are based, is 

K,t — M + a r + + a A/ + e ,ik- (17.2). 

Without any loss of generality the a, fa and ay? terms in the above model are 
assumed to sum to zero over both i and /, i.e., 2 “< = 2 & — 2 “&/ = 

2 «£, = £>. 

In Sec. 17.4 we shall take a closer look at the nature of an ANOVA 
interaction and examine some illustrations. 


17.3 

LEAST-SQUARES ESTIMATION 
OF THE MODEL 

With data in hand and having adopted a particular abstract model to explain 
them, we still have the task of relating the data to the model. What features 
of the data influence the value of ft in Eq. (17.2)? How can the data be 
manipulated to disclose information about the values of the a,-’s, the fa's and 
the * fa/s ? 

This is the same general problem we faced when we sought to predict 
Y from X by means of a straight line or when we sought to explain the 
differences between factor levels in a one-factor ANOVA. The problem is 
basically the same, and our approach to its solution is no different. We fit 
the model in Eq. (17-2) to the data so that a criterion of least squares is 
satisfied. In this instance as in the others, the criterion of least squares is as 
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right to left. To summarize, in X m , 

/ — 1, 2, . . . , / for factor A, 

J = 1*2 J for factor B, and 

k ~ 1, 2, .... n for “within cells." 
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(17.1), The more useful and descriptive mode 1 differs from Eq. (17.1) by a 
term that denotes the effect of the unique result of combining level * of 
factor A with level j of factor B. “Unique'’ in this context means that the 
outcome one would obtain from combining level / of A with level j of B may 
not be the simple sum of a, and /?,. If not, perhaps a new a/?,,, which is not 
the product of a, and is needed to describe the scores in the ijth cell. 
Such a term is called an interaction term. [The concept of the interaction of 
two independent variables is due to the famed English statistician-geneticist, 
Ronald Fisher. It was Fisher's concepts of experimental control through 
randomization and the study of the effects of several factors and their 
interactions simultaneously that successfully overthrew the “ one variable at a 
time” orthodoxy of experimental agriculture in the early 1900’s. See Fisher 
(1925) and Stanley (19666).] If an interaction term is needed in the model to 
describe the scores, then knowing the general effect of level / of A and of 
level j of B is not enough to predict the outcome in cell ij. An interaction is 
tenuously analogous to two polarized lenses: iight passes through each lens 
individually, but when one is laid on top of the other (when they are combined) 
no light passes through the pair. The separate effects of the lenses differ from 
their combined effect. 

The expanded model, the model upon which the analyses in this chapter 
are based, is 

Km — M + «< + A + a P,s + e m- (17.2) . 

Without any loss of generality the a, /?, and a/5 terms in the above model are 
assumed to sum to zero over both / and j, i.e., X *i — 2 A — 2 “Ay — 

I«A, = 0. 

In Sec. 17.4 we shall take a closer look at the nature of an ANOVA 
interaction and examine some illustrations. 


17.3 

LEAST-SQUARES ESTIMATION 
OF THE MODEL 

With data in hand and having adopted a particular abstract model to explain 
them, we still have the task of relating the data to the model. What features 
of the data influence the value of fi in Eq. (17.2)? How can the data be 
manipulated to disclose information about the values of the a/s, the /5/s and 
the *P,/ s? 

This is the same general problem we faced when we sought to predict 
Y from X by means of a straight line or when we sought to explain the 
differences between factor levels in a one-factor ANOVA. The problem is 
basically the same, and our approach to its solution is no different. We fit 
the model in Eq. (17.2) to the data so that a criterion of least squares is 
satisfied. In this instance as in the others, the criterion ofleast squares is as 
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x m =z 

X 1i2 =4 

A(2i = 1 
*122=3 

X 2i ,«6 
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*222=3 


Factor A 


Factor B 
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«Pl1 

0/3)2 
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0022 


FIG. 17.2 


FIG. 17.3 


follows: (1) values are substituted into Eq. (17.2) for ft, 

Pi f-tPjt *Put- • • » (2) these values along with the data 
determine, by subtraction, values for the IJn errors e llk ; (3) when the sum of 
the squared errors so determined is as small as it is possible to make it, the 
least-squares estimates of ft, the a/s, the /3/s and the aft/s have been found. 

For example, eight scores are gathered in a simple 2x2 design, with 
dat a S ° 2 ‘ ^ Sh ° Wn ' n ^ l7 * 2 ' WC postulate thc foi| owing model for the 

*«•“/• + a, + ft + «P„ + e< ft , i = l,2 


k = 1,2 

Fii 172ili 3 !he™li? 1 j raai " i °“ [a “ io " underlie the data in 

F.g.l7.2,nth em annerd ep ,cted,nFi g .n.3. Henceitisassumed.forerample, 

•«■... = /. + «. + ft + aft, + 

By subtraction of appropriate terms, wc see that 

«n.-=Af m -0' + n, + ft + n ftl ). (17.3) 

suppose we let u = 5 a 2 « — i a « „ 
a P</s equal zero. Will thk ’ . 1 1 » Pi — 0. P2 = 1 , and all the 

Ci8h ‘ “t“" d We can ealie^by I,’ twi” th ' $ “ m ^ ' h ' 

'Hi ” 4 - (5 + 2 + 0 + 0) = _3. 

uung the squares produ'* Tvaluf 'if S | "' mI " l 7,sq , iaring each, and sum- 
squared errors possible ? No * S l ^ ls l ^ e smallest sum of 

errors is produced by the follauino « appcns l ^ at ‘he smallest sum of squared 
§ cas squares estimates of thc parameters 
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of the model: 


= 4 «Ai = --5 

= — 1.5 ofts = .5 

= 1.5 «fti = -5 

= 1 «ftl = --5 


The sum of the squared estimated errors, which were obtained by 
substituting the above least-squares estimates into Eq. (17.2) along with the 
data X m , is equal to 8.00, the smallest possible value for any choices of jir, 

the What manhiulationfof the data will produce the least-squares estimates? 
As was true in the one-factor ANOVA the least-squares estimates of the 
terms in the model for the data are obtained by simple averaging of the data 
in various ways. For example, the least-squares estimate of ft is just the 
in various ,, the 2 x 2 table. The least-squares estimate of a, 

m ° ,?L mean of the four scores in row 1 of the table minus the mean of all 

Zrs oreT ^ ltl squams estimate of ft is the mean of .he four scores 
eight scores, i ne H Th e , east . squares estimate of 

irlsTe rnTn of"tcoL in the cel. a. the intersection of row 1 and 
column 1 minus the mean ofrow 1 minus the mean of column 1 plus the mean 
of all eight scores. #f ^ , ation of scores from which those scores 

• = Le - u" and the ith column of the data layout were sampled. Let 
in the ith row and 1 thejth cohuni ^ ^ u, c average of the 

/p.^in'^heyth^olumri; and^let/i be the average of all (he q./s. For example, 

^ , ' el ^^" S q Uare Shg S , characterized 6 i^terms ^th^dehnition in me 
“and'S long-range average (or exiled) value that they attain in 
the population. This has been done in Table 17.1. 


FIG. 17.4 
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TABLE 17 , KUHH.QN IN THE TWD.FACTC* FIXED-EFFECTS 

ANOVA 


Term in the model 


*2 1» 


Papulation value 


a, » - X... 

it = ■*!., — ■*... 


p 

fit.-fi 
ftt .-! 1 
lit ~ P 
A.-/* 


<9» « i iu 

-it- 

— JE 4- Jr,.. 

fn 


-fi.l + f 


~ A.. 

— ■*.*. + •*-- 

fix 

~fil. 

— fi.t + P 

ufiu *= in. 

it.. 


Pt 

-fit. 

-fi.l + P 

*= 

— it. 

— i.u + i... 

Pit 

~fil 

- fi-l + P 


Before proceeding to the problem of testing hypotheses about the terms 
of the two-factor ANOVA model, it would be wise to dwell a little longer oh 
the meaning of what has been called the interaction of factors A and B- 


17.4 

THE NATURE OF INTERACTION 

In addition to being interested solely in the effect one variable (independent) 
has on another variable (dependent), investigators frequently ask whether 
this effect is the same for all levels of a second, independent variable. If 
this effect is not the same, an interaction between the two independent 
variables is said to exist. Suppose three different methods of teaching are 
being compared experimentally. It is found that one method is relatively 
best with high ability students whereas another method is relatively best with 
low ability students. We say that an interaction between teaching method 
and student ability exists. 

In a study on test-wiseness, multiple-choice and free-response items were 
given to both American and. Indonesian students. Interest was not focused 
on which type of item was harder ot which group of students had the 
superior scutes, but on the interaction between nationality and item type. 
Did one nationality do relatively better on one type of item and the other 
nationality do relatively better on the other type (interaction); or was the 
degree or superiority of one nationality over the other the same for both 
types or item (no interaction)? 

An example of hypothetical data displaying an interaction effect between 
sex and type of reading material on reading speed is shown in Fig. 17.5. 
Notice that the boys read relatively faster with material of one type; girls, 
with material of another type. The difference between boys’ and girls' 
reading speed is different for different reading matter. Notice that when 
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Girls 

■ -Boys 



Fo shion Americon history Sports 


Reading material 

FIG. 17.5 Graphic representation of the mean reading speed scores for 
boys and girls on three types of reading material. 


boys and girls are combined, i.e., the lines are averaged, the three word 
groups are equally difficult. We say that there is no main effect for “type of 
reading material.'' Further, when the three reading materials are combined, 
the boys and girls read equally fast. 

Consider the data in Fig. 17.6. Here again the boys read relatively 
fastest with material on “sports.” Even though the girls are clearly faster 
than the boys, their superiority is not the same for all reading materials. 
Thus, there is an interaction between sex and reading materials (as well as a 
difference between the sexes). Notice that the requirement for interaction is 
merely that the differences between the sexes on the different reading 


•Girts 



FIG. 17.6 Graphic representation of the mean reading speed scores for 
boys and girts on three types of reading material. 
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materials be different — they do not have to be reversed, as was the case in 
Fig. 17.5. This is why the words “relatively the same” and “relatively - 
fastest” were used in the introductory paragraphs. 

Differences across reading material in the difference between the sexes are 
reflected geometrically by nonparallel slopes. The fact that the solid and 
dotted lines are nonparallel reflects the presence of interaction. 


A. Demonstration that Absence 
of Interaction Between Two 
Factors Implies Parallel Lines in 
the Graph of Cell Means 


Suppose we have a 2 x 2 analysis of variance design as shown in Fig. 17.7. 
tn ttus digram, /r's represent population means for rows, columns, cells, 
then blr n n ° ,nter fH 0Q exists bctween the two independent variables, 

ft> = /* + (ft. - /*) + - ft). 

If interaction does exist, then this equality does not hold. 


*2 



" F * 17 - 7 ™ -h a way ,h: 
17.8. For the lines in Fig. 17 8 to be mJii ! heparaIlel ,in es shown in Fi 
hne to the other must be f qua! at al^K * i’ th * VerUcaI distance from m 
/'n to /.„ must equal the distance from™’ t( J n /i Pa ” iCuIar ’ thc d «tance fro; 

“ t0 aid *■ 
‘Ufhce. but fn, ,k‘ h fiu ^ and A, arhiir^’ , not csscntial ‘hat ‘1 

ANOVa for f^or ? irT 01 ' ! CVdl of fa ct or ^ a " d 5paccd ’ wou 

f A ,n mmJ - S« Stanley (|9 6 9) y h P 0ne kce P ,hc rcsu “* °f ‘1 



sec. 17.4 


THE NATURE OF INTERACTION 40 9 


Girls 


Boys 
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FIG. 17.9 Graphic representation of the mean reading speed scores for 
boys and girls on three types of reading material 

Assuming no interaction in the analysis of variance sense, we have 

Hi =P + (fit. —m) + (Ai - A) 

and 

Hi = fi + (ft. ~ /*) + ( ft, - /<)• 

Also, 

fti = /< + (ft. -/•) + (ft i - >o 

and 

Hi = P + (ft. - M) + (ft.i ~ /*)■ 

The distance from /r n to /x n is (x n — yr 12 , which you can easily show to 
be equal to fi A — ji A . The distance from fi 2l to fi zz is ju n — /%, which also 
equals ft A — fi A . The two distances between the pairs of means are equal. 
Thus, the lines are parallel. 

Notice that the slopes of the lines for the boys (and, in turn, for the 
girls) is the same in Fig. 17.5 and in Fig. 17.6. Consequently, the degree or 
magnitude of the nonparallelism, and hence the degree of interaction, is 
identical in Fig. 17.5 and Fig. 17.6. 

Figure 17.9 illustrates a situation in which no interaction is present. 
Notice that the lines for the boys and girls are parallel and thus the difference 
in reading speed between the sexes is the same for each type of reading 
material. 

B„ A Caution 

One should not overlook artificial reasons for the existence of interaction , 
For example, consider Fig. 17.10, which illustrates the interaction 




texlbook progrom 

FLG. 17.10 Illustration of a possibly artificial interaction between 
intelligence and [earning condition. 

between intelligence and learning condition. In a sense, the definition 
of an interaction effect in terms of a simple sum of population means an 
the consequence that “no interaction” implies parallel lines when the means 
are graphed is an imperfect interpretation of our notion of a "unique” and 
“unpredictable” outcome resulting from combining the levels of two factors. 
Certainly, many graphs of means from a two-factor design evidence non- 
parallel lines, yet we feel uncomfortable talking about “unique and un- 
predictable” effects of the combining of two levels when the nonparallel lines 
result merely from a “ceiling” or "cellar” effect on a particular paper-and- 
pencil test. (For an example of such interaction, see Stanley, 1969.) This 
illustrates the fact that perhaps allowing our least-squares estimation pro- 
cedure to determine that interaction effects would be measured by f*a — 
fit. ~~ fi.i ■+■ fx did not capture the full richness of our intuitive notion of 
aP„, the unique and unpredictable (from main effects, at least) result of 
combining factor levels. Such failure to see our intuitive notions reflected 
perfectly in our mathematical models is a hazard (or reality) of an attempt to 
represent the real world mathematically. 

C. Two Types of Interaction 

In the statistical literature, a useful distinction is made (e.g., by Lubin, 
1962o) between two types of interaction: ordinal and disordinal. In the 
ordinal case, the rank order of the categories of one variable on the basis of 
their dependent variable scores is the same within each category of the second 
independent variable. In Fig. 17.6 we see an example of ordinal Interaction. 
It » ordinal because the girls read faster than the boys on each type of 
reading material. Figure 17.10 also illustrates ordinal interaction. However, 
m Fig. 17.5 « u nor the case that girls read faster than boys on each type of 
410 
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reading material. Figure 17.5 presents an example of disordinal interaction. 
When the lines do not cross , interaction is said to be "ordinal”; when the lines 
cross , interaction is said to be "disordinal" 

The importance of this distinction for interpretation is this. When inter- 
action is ordinal, it makes sense to assume, for example, that when girls 
score higher than boys on the dependent variable the superiority exists for all 
types of reading material. That is, even though an interaction exists, it is 
still meaningful to make a single statement about boys and girls, over-all, 
without qualification or reference to some other variable. However, when 
there is a disordinal interaction, asserting that girls score higher (i.e., read 
faster) than boys cannot be understood to mean the superiority is main- 
tained on all types of reading material. Disordinal interaction indicates that 
only in some situations are girls superior to boys on the dependent variable. 

Let’s consider an example in which teaching methods 1 and 2 are being 
tried out in either urban or rural settings. If there is a disordinal interaction 
between “teaching method” and “setting” and a significant difference 
between methods in favor of method 1 , it would not be possible to claim 
that method 1 is better than 2 “across the board.” Before we could recom- 
mend that a school adopt one method or the other, we would first want to 
know whether the school was rural or urban. 

The advice usually given is that whenever there is a significant inter- 
action, one should plot the means for the various combinations. Unfor- 
tunately, research reports usually do not present sufficient data to allow the 
reader to follow this advice. 

It should be emphasized that “ordinality” and “disordinality” are 
properties of graphs. A choice exists between placing factor A or factor B on 
the abscissa when graphing an interaction. The same cell means can give an 
ordinal interaction when factor A is on the abscissa and a disordinal inter- 
action when factor B is on the abscissa. 

A technical article well worth reading when one has acquired an intro- 
duction to analysis of variance is Lubin (1962a). 


17.5 

STATEMENT OF NULL 
HYPOTHESES 

If in Fig. 17.1 factor B, “sex,” was disregarded, the data would be 
identical to those gathered in a one- factor experiment comparing three 
teaching methods. Eight observations would have been gathered under each 
level of factor A, and the one-factor ANOVA model would be appropriate 
for a statistical inferential test of the hypothesis that the three population 
means underlying the teaching methods were equal. This null hypothesis, 
namely that the population means for the three teaching methods are equal, 
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is identical to one null hypothesis of interest in the two-factor ANOVA. 
Specifically, we are interested in whether the data gathered in a two-factor 
ANOVA support or run counter to a decision to accept as true the statement 
H t that /!,. = /!,. =* 

In any two-factor ANOVA, the null hypothesis for factor A can be 
stated as follows: 

H 9 ' Pi. ~ pz. = . . . = p Im 


(Null hypothesis for factor A) 
Notice that the equality of the I population means underlying the / levels of 
factor A has implications that can be used to state H 0 in several equivalent 
forms. 


Because, when k — 1, 2 n for every i/th factor-level combination, 

fi is the average of the JJ population means (one for each cell), equality of the 
f'- s that each ft. equals p. If each ft. = p, then ft - p = 0 for 

eo'til S - 0rfaCt0r u' < ' . S '°“ *" lhc ” ui ” ctfei:1 for 1evc! ‘ or factor A. is 
“‘ S “ r ° if lhc hypothesis is true. The 

face l " ‘ eq, “ c ° Um “W -/‘'“'‘ns 'he null hypo, hem for 


1. A. = . . . = ft., 

2. i (ft. - py = o, 

3. //.:£«> = 0, 

*■ 11 a11 <A. -p) = 0 , and 
5. all x, = o. 


faaor A instead of'Sr ft’ The^deKbommt ‘’’e rtT ' Ch0SC *° U ‘" C about 
rfeci'.."'™™^ . ; °f the statement or the null 
of H 0 for 


h>pothcsis for factor B is perfectly a '„ V ' ° Pment ° flhc statcme nt o 
A Specifically, as regards factorlj u^arr "“.‘‘“lopment 
, * w u * - • ’ 1 re interested in rejecting or accepting 


Tactor B arc” alTH,^* TOanS under Wtiie leyck” of 


r < 2 re equivalent ways of 


2 - -/.)'« 0 , 

3 - ".:£« = 0 , 
3.«.:all(rl ,-,0 = 0. a 

5 - all ft = o. 
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There are many ways in which the null hypothesis about a main effect, 
i.e., about a single factor, can be false. For factor A, fa. could equal 20.5 
and the remaining fa. could equal 29.1, for example. Or fa. = fa. = 16.65 
and fa. = fa. = 17.80. In both instances, H 0 is false. All that it takes for 
H 0 to be false is for at least two population means to be unequal. The 
decision faced in an ANOVA is whether one should opt for the truth of H 0 
or the truth of H lt the logical converse of H 0 , which is true when H 0 is false. 
H lt the alternative hypothesis, can be stated in the following equivalent ways 
for factor A: 

1. H x ‘. fa. fa*, where / and i* are distinct, 

2. Hi- i (fit. 0, 

3. 

i-l 

4. H x : fa.—p^O for at least one i, and 

5. H x : a, ^ 0 for at least one /. 

Each of the above equivalent statements will be true if and only if the 
null hypothesis for Factor A is false. Hence, if we reject , we automatically 
accept H v 

The form of the alternative hypothesis for factor B is perfectly analogous 
to the form of H, for factor A. The reader may wish to state H, for factor B 
in at least five equivalent ways. 

There remains one hypothesis of interest, and it concerns the collection 
of interaction terms, ctfa,. In Sec. 17.4 our attention was directed toward two 
sets of conditions: (1) the graph of the population means produced parallel 
lines; (2) the graph of the population means produced nonparallel lines. We 
saw in Sec. 17.4 that if no interaction exists between A and B, i.e., if the 
graph of the population means shows parallel lines, then p„ will equal 
p + ~ p) + (p.t — p) = p + a i + Pi- An equivalent condition is 

that pn = pi. + p.t — P- Now if this condition is satisfied, then 
Pu — fa. — fi.i ~h P ~ 0 for all the pfs. 

Or, in terms of the model in Eq. (17.2), 

ct fa = 0 for all i and j. 

If the lines in the graph of the cell means are not parallel at any single 
point on A, then at least one a fa, is not equal to zero. Hence, parallel lines 
in the graph of the // population means correspond to all of the afa's 
equaling zero; nonparallel lines correspond to at least one of the afa,’s not 
equaling zero. These two conditions represent the null hypothesis and the 
alternative hypothesis, respectively, about the interaction of factors A and 
B. There are several equivalent ways of stating the null hypothesis //<, and 
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the alternative hypothesis about the interaction effects. Some of these 
follow: 


Equivalent statements of H, for the 
interaction of A and B 


Equivalent statements of Hi for the 
interaction of A and B 


1. J 7,: all 0 ‘ti - fit. -fij + ft) = 0, 

2. H t : all aft, = 0, 

r j 

3 - IT * : 2 2 (?«-&. -Pj+mv = o. 

4-1 M 
and 

4 - H t : J 


1. Hi. ft it ~ fi t . ~ /i.i + p =£ 0 for at 

least one f t 

2. H t : aft, 0 for at least one aft,, 

i J 

3. Hi: 2 2 — fit. — fi.t + ft)* ¥= 0, 


4 - Hi- 2 J aft* # 0. 


J h£ " “t thr “ pairs ° f >W°' h “« that arc typically of 
,n tb ' •“ '^ctor ANOVA : (1) H, and H t for factor A. (2) H , and H, 
factor S, and (3) 11, and ff, for the interaction of A and B In the 
wrS S£a ‘ 0 “ f ap “ r " S shaU “ e how ,he dala gadded in a 

ff o for d Zfl ** °» ">e decision to accept either 

Ht or H t for A, for B, and for the interaction of A and B. 

17.6 

SUMS OF SQUARES IN THE 
TWO-FACTOR AN OVA 

,hr “ n'ul^hypoUicsc^!^ ^fatt^r 1 Al^V^iV marT CStS ^ ^f 
do not point toward your route, you may begin to feel that the markers 

^uired^usette ^neX “ “ a " y ° bvi °“ s . PaUt ”“ “ 

nnnl you are on the last leg of the journey. S '' PS '" U " 0t b “ orae c,ear 

ANOVA°OHMorX(2) taTtm *“ “ 11 = d > in ““ 1 '™-fa«°r 

utthin" cells orcombinaUons of lelekor a " d B ■ and (4) 

squares for each source of sanation in turn d Wc sha!1 define sum of 

Sum of Squares for Factor A 

U ““ “» 

SS - " nJ i, i ‘ = «' Z ML - *_)>. 


( 17 . 4 ) 
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Recall that a, = fi fm — ft is estimated by X { „, the mean of the nJ scores 
in level / of factor A, minus X.„, the mean of all nIJ scores in the layout of 
data. The sum over all /levels of A of these squared estimates, (X { „ — 
is called the sum of squares for factor A. 

Equation (17.4) for SS A is definitional and does not make for easy 
computing. We shall return to the matter of actually calculating SS A in 
Sec. 17.9. 

Sum of Squares for Factor B 

The sum of squares for factor B is nl times the sum of the squared least- 
squares estimates of the /9/s: 

SS B = nl£fi = - Xj\ (17.5) 

Notice that nl is the number of scores averaged to obtain X,,.. In SS A , nJ 
was the number of scores averaged to obtain A’,.,. Again, the above formula 
for SS B is not convenient for computation. Computational formulas will be 
presented in Sec. 17.9. 

Sum of Squares for the 
Interaction of A and B 

SS AB = ni = » jj - X,.. - X.,, + XJ. (17.6) 

Notice as before that n, the factor multiplying the sum, is the number of 
scores averaged to obtain x„.. 

Sum of Squares “Within” Cells 

There remains one sum of squares, SS„, the sum of squares within cells. 

SS„ = 2 f i(*m - £,.)’• (17.7) 

(-i j-it-i 

The meaning of these four sums of squares will begin to emerge when the 
corresponding mean squares and their expected values are considered in the 
following sections. 

Incidentally, though it is of no great significance in and of itself, the sum 
of the squared deviations of each of the nIJ scores in a two-factor design 
around X... is exactly equal to SS A + SS B +•55^ + SS V , i.e., 

f f |(X,„ - XJ = SS A + SS B + SS AB + ss„ 


(17.S) 
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The terra on the left-hand side of Eq. (17.8) is called the total sum of squares 
and denoted SS tolat for obvious reasons. The total variance of the collection 
of n/J scores is analyzed into four additive components; hence, we use the 
expression analysis of variance , abbreviated ANOVA or sometimes anova. 

17.7 

DEGREES OF FREEDOM 


T TF'Pr* " ,he ,W °- faaor AN0VA & converted into 
rfsumT™ r 1Sd ' P “ !orfmJom ' Th = d 'ff e “<>ff'“don. 

that 2ll“>r:,r s the num , bcr ° f estEs of effects 

restrictions placed on^hle'euimato’ Ad "Td 7 0 . f . inde P endcnl UnEar 
abstract notion andw.,£uj Admil,e ‘ 1I y. >hts is a difficult and 

natural and unrcstrictivc to spccif^n^™.?""?"/ 1 11 waS 

+ - 0. Furthermore, afSeh t d,d " ^ (l \ 2) ' hat + • ■ • ■ ■ 
necessary to assume that 5 4 - E . - dld 1,01 it happen, it was 

mathematical criterion of least-so’uares esf,~ ?• bef0r ' ! ,he so '“ ,ion “ lbe 

as they must, the least-squares <3, ."I 2110 " found ]odecd 

+ ' ■ • + * " »• ™“re~r^ M,i5ry thiS <*■ 

’ l 

= JPj = o, 

F wtlumn means P,„. 

must conform to the single linear'rauta” "h <al “ lalion ° r SS A and they 
he degree, of rradom * r £“*»« that their sun. be zero. Hence, 
re^onmg would lead to the correct conli exact| y analogous line of 
freedom equal to J _ “ ,,Tra inclusion that SS„ has degrees of 


The calculation of STST ■ 

% terms - The rcstricuo^ . - ° ,VCS ‘ hc IJ *■» 


solve thA "i rcst nctions it was necessarv • squares estimates of the 

Toil fL r a eaSt ’ SqUares P r °blem werfS n ° n thc * «*•»«*• to 
rows for any Given c 11131 summing the. 


r ields a _ umnUn g the estimates 


^ given row yields a sum of 
= 0 for each / 

r ^ 

X*0„ = O for each j. 


07.9) 


(17.10) 



sec. 17.7 


DEGREES OP FREEDOM 417 


The conditions in Eq. (17.9) arc / in number; there are J restrictions 
represented in Eq. (17.10). Not all I + J of these restrictions are independent, 
however. Namely, given the restrictions in Eq. (17.9) and knowing that 

2«ft# equals zero for j = 1 /—I, it must necessarily follow that 

2 a PiJ — 0. Hence only / + J — 1 of the linear restrictions on the 1J 

t-i 

values of a/?,, are independent. Therefore, the degrees of freedom for SS A # 
are 

IJ — (I + J — 1) = // — / — / + 1. 

Notice that this expression “factors” into (/— 1)(J — I ). 

The sum of squares within cells, SS wt is actually the sum of the squares 
of the nIJ least-squares estimates of the e (ik terms in the model in Eq. (17.2). 
Any single e iik is estimated by e iik ~ X„ k — X u . 

Since e m is the deviation of a score from its cell mean, the sum of the n 
values of e ifk within each cell is zero. Thus, there arc IJ independent linear 
restrictions on the nIJ values of e (fk . Consequently, the degrees of freedom 
associated with SS„ are IJn — IJ — IJ(n — 1). 

The above results can be summarized as follows: 

Sum of squares Degrees of freedom 


ssu 

I - 1 

ss a 

1 

ss AB 

(/ — 1)(7 — 1) 

ss„ 

IJ(n - 1) 


Now look at a very simple example of linear restriction. How many 
independent restraints on the data of the 2 x 3 table in Fig. 17.1 1 are there? 



FIG. 17.11 
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The conditions in £q. (17.9) are I in number; there are J restrictions 
represented in Eq. (17.10). Not all / -r J of these restrictions are independent, 
however. Namely, given the restrictions in Eq. (17.9) and knowing that 

2 a Pu equals zero for j = 1 1 , it must necessarily follow that 

7 

2 a fitJ ~ 0- Hence only I + J — 1 of the linear restrictions on the 1J 

i**i ^ 

values of a/?„ are independent. Therefore, the degrees of freedom for SSj& 
are 

IJ - (/ + /- 1 ) = //-/-/ + 1. 

Notice that this expression “factors” into (/ — 1 )(J — I). 

The sum of squares within cells, SS U , is actually the sum of the squares 
of the nV least-squares estimates of the e ak terms in the model in Eq. (17.2). 
Any single e m is estimated by e ilk «= X lik — X (u . 

Since e i1k is the deviation of a score from its cell mean, the sum of the n 
values of € ifle within each cell is zero. Thus, there are IJ independent linear 
restrictions on the nIJ values of e i}k . Consequently, the degrees of freedom 
associated with SS* are JJn — JJ ~ JJ(n — 1). 

The above results can be summarized as follows: 

Sum of squares Degrees of freedom 

SS A /- 1 

SS B J- 1 

SS AB (/-i)(7-i) 

SS U IJ(n - 0 

Now look at a very simple example of linear restriction. How many 
independent restraints on the data of the 2x3 table in Fig. 17.1 1 are there? 


Row 

Column number 

Row 

number 

t 

2 

3 

sums 

1 

1 



7 

2 


4 


9 

Column 

sums 

4 

6 




FIG. 17.11 
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(Try filling in the four missing cell entries, the missing column sum, and the 
missing grand sum.) 

If the number of degrees of freedom is the number of cells (here, 
2x3 = 6) minus the number of independent linear restrictions on the data, 
how many degrees of freedom are there for a table of this type? That is, 
how many cell entries are free to vary? Is this analogous to the degrees of 
freedom for the interaction of factor A with factor it in a 2 X 3 factorial 
design? [Hint: For that ANOVA, the cell entries are interaction residuals, 
the six (Xff, — X,,. — Xj_ + Xj's, and every row sum and column sum is 
zero.] 

17.8 

MEAN SQUARES 


For each sum of squares there is a mean square (abbreviated MS) defined by 
dividing the sum of squares by its degrees of freedom: 


MS a 
Af S B 


ss A 

I - 1 

J-l 


MS m 


ss An 

(/ — l)(J — !)' 

ss. 

«(" ~ 1) ’ 


As in the one-factor ANOVA, the 
calculations leading toward significance 


mean squares are the final stage in 
tests of the null hypotheses. 


17.9 

COMPUTATIONAL PROCEDURES 


!hc th ,^r^ u ' app,i ' d in 

m which factor A has / levels f a «or n"h#« r i l 7°' faclor AN0VA design 
contains » observations, the Vow su^nr * nd C3Ch ° f the //ce,Is 
conveniently from the formulas in Table n •* obtained most 

From a population of 2500 J j 
selected at random for arc ifan ‘ ™ h : |ra<le E'°">etry classes 48 Mere 
determine the efficiency oftvso diff™, a™' rcscard >C' “ished to 
instruction and their interaction, a"' ' ^ Ecom "'* 

numbcri to the four combinations of media aj^methwh'^uie^one semester 
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TABLE 17.2 COMPUTATIONAL FORMULAS FOR SUMS OF SQUARES IN THE TWO-FACTOR 
AN OVA WITH EQUAL n’s 


SS,- 2 


v \f — i t-i / _ L 

3 Z n/J 


JL i. (I/'") 

■22 


L - ss, - ss, - 

, , . i j ( 2*») 
1.1 1-1 1-1 


(MM 


incfn.rtinn the researcher administered the same geometry achievement 
test to each class. Since the sampling unit was “classroom,” the ” mt ° f 
analysis “taken to be the Cass mean (to the nearest whole number) on the 
criterion achievement test. The results^are tabulated m F>g. 17.12. ^ 
In the above exaj"'P . ’ Fj ’ 17 12. in addition, one more quantity 

■“ "*"" d - This 

quantity is: 

il fx,% = 2« + 5’+... + 35'= 14,969. 

(a 1 j — 1 2*1 [. 

Only those quantities in Fig. 17.12 are required to find the four sums of 
’qua^nmo" is calculated as follows: 

(229) 2 + (506) 2 _ (735)^ _ j 593.52. 

SS A = 2 . 12 2-2-12 

The remaining sums of squares are found as indicated hereafter. 


ss B 


_ (235)* + (500)* _ ( 735 )-l = 1463.02. 


2-12 


2 - 2-12 


. Twenty-four score 

calculations less ^ an analysis of variance are not affected by adding a 

“> U r,To e h oTs wX However, tlTmean of (X + c) is T + c. so every mean ,n 
r’ANOVA ha-tn reduced 24 points. 
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Fcctor 0, medio 


2, 5, 6, 7. 4. 6 

9, 12. 14. 15, 10, 13 

7. 8, 4, 6, 7, 10 

14,16, 10. 13,14,17 

12 


Z *i„-72 

E, X,„ = I57 

10,13,14, 16,10,13 

21,25.31,33.22.26 

14,17,11,13,15, 17 

32, 34,22,30, 32,35 

e 


Z X2i» m IS3 

X X«*=343 





“ 235 ,|, *12* “ 500 


2 12 

X X X i;t = 229 
/=i *=! 


2 12 

X X X 2/ * = 506 


2 Z 12 

Z Z X X ,/* ** 735 

i=l /-I A-1 


no. mz Illustration „( talc calctibtioiu m , ,.o.„ y of 


ss, e= g>L rH«,-+n63y +( 34,> ; _ _ roy 

2 - 2-12 

-.W-rdS?)* o -(163)* -K343V* 

12 1598.52 - 1463.02 - - (735) ■ 

2 * 2-12 

= 188.02. 

SS_ = 14.960 _(!?)* ~ (157) s 4 - 11631 s 4 - ( 343 )* 

U = 464.75. 

in Tabic 1 7.3. SqUar ”’ dcgrces of freedom, and mean squares are reported 

2 * 2 1*To' 2 3 * iS'5“- c StS'S S fS F I"“ DO ’ 1 - "10 MEAN SQUARES FOR THE 
Source of tariation 

df ** ms 


Factor A (method} 
Factor B (media) 
Interaction of A and f 
Withm cdu 


/-1 = I 
/-l = I 

U{n - ]) =44 


UWJ2 1 598.52 

1463.02 1463.02 

188,02 J88.02 

464.75 10J6 



17.10 

EXPECTED VALUES OF MEAN 
SQUARES 


The computational aspects of the two-factor ANOVA are complete. Now 
it is time to turn our attention once again to the purpose of the computations. 
We are seeking a statistical inferential test for deciding whether the data 
support the null hypothesis H 0 or the alternative hypothesis H x about the 
main effects of factors A and B and their interaction effects. As was true in 
the one-factor ANOVA, the expected values of the mean squares reveal how 
they bear on the truth or falsity of the three null hypotheses. 

The expected value (or “long-run average value”) of MS„ is the mean of 
all the MSJs that would be obtained if the same two-factor ANOVA design 
were performed an infinite number of times with independent observations. 
Another way to look at £(A/Sy,) is that it is the variance of the population 
from which the observations in any one cell of the two-factor ANOVA design 
have been sampled. IVe shall assume that the variance of the population from 
which the n observations in any cell have been sampled is equal to a*. In other 
words, the variances of each of the populations underlying each of the tj 
cells are equal to the same value, o\. This is an extension to the two-factor 
ANOVA of the assumption of homogeneous variances that we saw in the 
one-factor ANOVA. 

If the n observations in the ijlh cell are assumed to have been drawn from 
a population with variance o\, then £(s* ) = o\. MS W has the following form: 

„ ill I *.* . - 

wj _ _ < I 

* IJ(n-l) lJ(n — J) 


We recognize MS W to be the average of the JJ within cell sample variances, 
i.e.. 


MS V = 


i j 

224 


We see, then, that 

£(MS„) - E\ 


I J ' 


The expected value of MS A is theaverage of an infinite number of AfSfs, 
each one obtained from an independent replication of the same two-factor 
ANOVA design — / levels of factor A, J levels of factor B, and n cases in each 
of the //cells. In the 2 x 2 experiment in the preceding section, the value of 
MSj was 1 598.52. This is just one observation from a hypothetically infinite 


41t 
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population of MSfn that could be generated by replicating the 
methods X media experiment with a new set oT 4$ classrooms randomly 
drawn from the population. We don't know whether 1598.52 is above or 
below the population average value of A(S a , tbe expected value o 


MS a , E(MSj). , , 

From one replication of the experiment, we can calculate one value oi 
\{S A . However, we cannot calculate the numerical value of E(AfSj); » 
v, e could there would be no need for inferential statistics. The algebraic 
formula for E(AfSJ) in terms of the parameters of the model in Eq. (17.2) 
can be found. We shall not bother with the details or the derivation here; 
we shall simply state that E{MSj) has the following form: 


nJ X a * 

E(MSJ=<r’, + -j-^y-. (17.10 


where a J is the variance of the error term in Eq. (17.2) and is estimated by 
MS k , and 

a, is the main effect of the ith level of factor A, i.e., tr, = p t , — p. 


Suppose that, unknown to the researcher, the true value of d\ (the true 
"within cell" variance) is 15.0, and that p u = 12 and p 2 . = 22. Since 
P = 1(12 + 22) = 17, a, is 12 - 17 = -5 and a, is 22-17 = 5. Sub- 
stituting these values and J =* 2 and n *= 12 into Eq. (17.1 1) yields 

E{MS a ) = 15.0 + 12 • S ' ] = 1215. 

2 - 1 

The above calculations were performed to illustrate the nature of the 
terms in Eq. (17.11). It must be emphasized that one never actually cal- 
culates a value for E(MSj). What is important is to note the relationship 
between the expression for E(MSj) and the truth or falsity of the null 
hypothesis about factor A. Notice that the third of several equivalent 
statements of for factor A in Sec. 17.5 is H„: £ = 0. The quantity 

hypothesized to be zero in H 0 for factor A is the same quantity, X**, that 
appears in the numerator of the second term for E(AfS Thus, ifH * is true— 
which means thatX = 0 — then 


1-1 

On the other hand, if If, hfib'-whid h means that £ «; is positive-!/,™ 
This is an important relationship to understand: If H, is mm, »c expect 
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to be the same size as the true wi thin-cell variance, a;; if H 0 is false, we 
expect MSji to be larger than <r*. 

The expected value of MS# has the form 


"'ip, 

E(MS„) = (,’, + -y j- . (17.12) 

J 

The null hypothesis H a for the main effect of factor B is ff Q : ^ = 0. 

If H 0 for factor B is true, then * 


E(MS„) = al + j® = al 


If H 0 for factor B is true, we expect MS B to be equal to E(MSJ) = a]-, 
MS b can be expected to be larger than a\ when H a for factor B is false. 

The expected value of A fS AB ,s 

"ii^pi 

The null hypothesis for the interaction of factors A and B can be stated 
1 j 

as /f 0 : ^ X x 0h ~ Thus, we see that if H 0 for the interaction of A and B 

is true, then E(MS AB ) equals o]; if H 0 is false, then E(MS AB ) > a]. 

AH of these important relationships are summarized in Table 17.4. 


TABLE 17.4 RELATTONSHfPS B£TW£EN NULL HYPOTHESES AND EXPECTED VALUES O* 
MEAN SQUARES 


Mean square 
Factor A, MS A 


Expected mean square 
when H* is true for 
source of variation 
in question 

a‘ 


Expected mean square 
when H„ is false 



Interaction of A and B, MS AB 
Within cells, MS* 


a] 

tr) 


r . , 

* j - 1 


< + 
nj 


(/- t)V- t) 


As we shall see in greater detail later, the comparison of MS A with MS* 
reflects on the truth of Jf 0 : % <*t — Efo ls true, then MS A and MS* have 
the same expected value; if H 0 is false, then MS A has an expected value larger 
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than a] but the expected value of MS a is still a\. Naturally, if MS A proves 
to be much larger than MS U in a particular run of the experiment, we are 
inclined to think that H 0 is false; if MS A and MS V are much the same size 
in a particular replication of the experiment, we are inclined to think that 
they are both estimating the same quantity, a], which is the case when H 0 is 
true. Comparisons of either MS B or MS An with MS U bear on the truth or 
falsity of the null hypotheses about the main effects of B and the interaction 
effects of A and B, respectively, in the same manner that comparing MS A with 
MS W tells us something about the plausibility of 2 aj = 0 being true. 

The problem of deciding when MS A (MS n or MS AB ) is sufficiently 
larger than MS V so that we should consider the truth of H e implausible is a 
problem of the variability of the values of mean squares from one replication 
of the two-factor experiment to the next. This problem will occupy our 
attention in the next section. 


17. II 

THE DISTRIBUTIONS OF THE 
MEAN SQUARES 


Bcfor cproce eding to the question of the statistical distributions of the four 

nature^ ofT mtuS T' AN ° VA “B"’ il is advisab,c ,0 <*>'»* * h ' 
factor de^ il„i‘ s S ' T one makes “ ,his situation. I„ the .wo- 
8 " lnS ' c ' ' 7 ' 9 ' 12 oitaervaltons were taken in each or the 2 X 2 = 4 

the following mean squares^ Ms - 1598 so™/!?' replication produced 
188.02, and MS. _ iaS6 A second' , ’ MS * ~ I463 02 > MS A , > - 

could be obtained by performine the *** replication of the experiment 
48 classrooms (12 in each cellV^hU me ex P enment with a different set of 
values for each of the four mean ™ C ° nd repllcation would yield different 
fourth, firth, etc. ,epto°LT,h?“^t C0 "“P luall y « least, third, 

replication would produce its own set nfr ^ nment cou ld be run, and each 

u. what will be the distribution of valu ™o™MS Sq “ a 'f ! ' Now the question 
number of replications of the ennerim n , A itemed from an infinite 
individually about ul ... “ P f nmt "' 2 We ask the same question 


tion to our model'll!' E^°(" 7“) ^ouwMut '' j* IKCessa ')' >° odd an assump- 
“ r an assumption and that in Sec 171 1) >hat the model in Eq. (1 7,2) is 

vanances of , hc hypoth^" ™= p u I'tL "“"'““aty , 0 assume iha. the 
experiment all have th P . p t,ons underlying the rj ell. .r -t,. 
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The Distribution of MS W 


With the addition of the normality assumption, we know that the n observa- 
lions in any cell— the i/th cell-constitute a random sample from a normal 
distribution with mean /t„ and variance a\. Each cell variance s’ u is an un- 
biased estimator of tr?; furthermore, we can conclude that 


that is j* h 5 has a distribution equal to the chi-square distribution (if = 
„ - /dWdrf by n - 1. This statement is true for the //independent cell 
variances sf,. From the additive property of chi-square variables, we know 
that 


xl-l XlJh >- 11 




Dividing the above quantities by //yields: 


i j „ 


MS„ 


XlJ(n- 1 » 

IJa \ <*\ IJ ( n “ ^ 

Suppose the value of a * is 15. Then 

MS W _ Xu 
15 44 

, a v,oi„p nf MS = 10.56 over 15 is one observation 
“om , a e eh"?ua 0 rfd".'ribu.ion «f- «) that has been rescaled by division 
by 44. 

The Distribution of **, ^ instance: the distribution of MS A when 

We must “^^,4 when Ht is false. If H, is true, then 
°' Z ' MSj z‘r-i 

a) ~/-l’ 

j-.trlhution over complete replications of the two- 
that is MS>^as a dS i (//-/- 1) divided by 

factor design that is uw 

/- , , , hpn MS Jo 2 has what is called a noncentral chi-square 

If Ho is false, then by /_ l. The noncentral chi-square 

distribution i is a mathematical curve that has a higher mean 

distribution with dj - ' 
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Hq is true, Ho is l° lse «_ 



FIG. 17.13 Distribution of 


and is generally to the right of the chi-square distribution (df = / — !)• 
The relationship of the chi-square distribution to the noncentral chi-square 
distribution is illustrated in Fig. 17.13. 

When ff # is fake, the values of A lS A tend to be larger than those of MSj 
when tf 0 is true. This is reflected in the displacement to the right of the non- 
central chi-square distribution. The larger the value of £ a*, the further to 
the right of jc*_ 1 /(/ — 1) the values of MS A jo\ will be displaced. Thus, the 
noncentral chi-square distribution in Fig. 17.13 is for one particular value 
of 2 only; there exists a separate nonccntral jr* distribution for each value 
of £«•. 

The Distribution of MS„ 

The distributional statements that can be made about MS B are quite analogous 
to those for MS A , as you might expect. 

If H 9 : 2 = 0 is true, then 

MS„ 

J - l' 

If H# is false, then AfS t ,/o’ has a noncentral chi-square distribution 
(df = J — l) divided by J — V. 

The Distribution of MS AB 

then* ^ = °* ' if lhcrc is no ,nlerac tion between factors A and B, 

-^Ati _ Xii-iiu-n 

< (f - l)(J - 1) * 

a distribution, ij = (1 - l)(j _ i), divided 
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Again, if the null hypothesis about the interaction of A and B is false, 
MSabI°» has a noncentral chi-square distribution, df = (l— 1)(J — i), 
divided by (/ — \)[J ~ 1). 


We are now prepared to combine the above facts into the major results 
of this section. Recall that the ratio of two independent chi-square variables, 
each divided by its own degrees of freedom, has an /'-distribution. 

Suppose that H 0 : £a* = 0 is true; then MSJo) ~ — 1). 

Regardless of whether //„ is true or false, MS Jo) has a chi-square distri- 
bution divided by IJ{n — 1). Now 


But notice that 


MSaK 

MSJo’ 


/>-., 


MSJo', MS a 
MS. / o’. MS. ' 




(Luckily we have eliminated o’ without having to know its actual value.) 

What we have shown is that the ratio of MS A to MS. has an /'-distri- 
bution with / — 1 and IJin — 1) degrees of freedom when H 0 is true. Thus if 
the value of MS A /MS„ for a replication of a two-factor experiment looks like 
a “typical” observation from the distribution F i-t.cjt.~it — by “lypicai” we 
mean that MSJMS . does not exceed the 90th or 95th or 99th percentile, say, 
of that distribution— we are inclined to think that I{ 0 is true. On the other 
hand, if H„ is false, then we expect the value of MS A to be larger than the 
value’ of d«'„. Hence, if we obtain a very large value of MSJMS.— a 
value that seems not to have been drawn from since it exceeds 

the 99th percentile, say, of that distribution— then we think that// 0 isprobably 
false. 

When Hot 2 ft' = 0 is !nre ' 

MSJ o’. MS g 

MSJoi MS. 1 , '" 1 " 

When “fit = 0 is tme ■ ,lien 

MS ab /o’ MS a e 
MSJ o 2 , MS. 

The effect of a false null hypothesis about either the main effects of 
factor B or the interaction effects of A and B is to increase MSj, or MS JIS 
without systematically increasing MS., thus producing a distribution, over 
complete replications of the design , or ratios of mean squares that is displaced 
to the right of the /’-distribution These relationships are illustrated for 
factor B in Fig. 17.14- 


f (/-!)( 
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MSgfMSiir 


no. 17.14 Distribution of MS„JMS. wtx complete replications at th « 
two-factor design when H 0 b ttue and when it is false. (Shaded portion 
contains the 5% of the ratios that exceed the 9Sih percentile in the F* 
distribution.^ 


17.12 

HYPOTHESIS TESTS OF THE 
NULL HYPOTHESES 

The discussion to this point haslcd us to three ratios of mean squares that will 
be called /"-ratios: 


MS a f MSg _ MS A g 
A ~ MS V ' B MS„* A ° MS V 
For the data in Table 17.3 these /'-ratios have the following values: 


1598.52 


151.38, 


1463.02 


138.54, 


188.02 _ 
= 10.56 “ 


We shall now point out how statistical tests of the three null hypotheses 
can be made using the /‘-ratios. These F-lests are similar to the F-test in the 
one-factor fixed-effects ANOVA. The F-test will be illustrated with the main 
effects of factor A . 

First, one adopts a level of significance a, which is, of course, the 
probability of rejecting f/ # ; £ a} — 0 when it is in fact true. The a so chosen 
determines a critical region, i.e., values of the ratio US a]MS k that will lead 
one to reject as true //»:2 <*? — 0. This critical region is all numbers greater 
than the 100(1 — *) percentile in the distribution i.e., all values 

larger than If the calculated value of F A = US A (US„ 

exceeds the critical value l— F r-i.iji*-»> then //„ is rejected. If F A is less than 
the critical value, lf t is not rejected. 
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Let’s go back to the example in Fig. 17.12 to illustrate the hypothesis 
tests. There we found that F A = 151.38. If H 9 \ = 0 were true, the 

distribution of F A over repeated complete replications of the 2 X 2 design 
would describe an /^distribution with 7—1 = 1 and IJ(n — 1) = 44 degrees 
of freedom. We would not wish to conclude erroneously that H 0 is false when 
in fact it is true. Indeed, we wish to adopt a decision rule for choosing 
between H 0 and 7/j : £ a? ^ 0 that will lead us to decide erroneously to 
reject H 0 in favor of H t only one time in 100, say. Hence, we want to adopt a 
risk of a = .01 of committing a type I error, rejecting H 0 when it is true. 
Since the only evidence in favor of //, is a large value of F A , we shall place 
the entire critical region of the test in the upper tail of the distribution F J(44 ; 
hence, the critical value for the test becomes .b 9 Fi. 41 . From Table E in 
Appendix A, we can determine that the 99th percentile in the ^distribution 
with ] and 44 degrees of freedom is approximately 7.25. 

Any F-ratio F A exceeding 7.25 will be taken to be evidence that the 
hypothesis 77 <>: 2 a ? ^ 0 is false. This statement constitutes the decision rule 
of the hypothesis test. If in reality f/ 0 is true, this decision rule will have a 
probability of « = .01 of falsely rejecting H 0 . Such is the magnitude of the 
risk one takes in agreeing to reject H 0 if F A is greater than 7.25. 

For the data in Fig. 17.12 the value of F A is 151.38. Since this F-ratio 
exceeds the critical value of 7.25, the hypothesis that J - 0 is rejected. 
One concludes that the two sample means and ^ 2 „ are significantly 
different at the .01 level. 

The F-test of the hypothesis = 0 proceeds along similar lines. 

Suppose it was decided to test H 0 with an a of .05. From Table E we see that 
the 95th percentile in the F-distribution with J - 1 = 1 and TJ(n — 1) = 44 
degrees of freedom is 4.06. Hence, if 7f 0 is true, the chances of obtaining a 
value of F b exceeding 4.06 are 1 in 20. The chances of obtaining an F n of 
138.54 or greater when 2 Pi ~ ® are infinitesimally small. Thus, rather than 
regard F B as a one-in-a-million occurrence that just so happened to occur 
even though was true, we take the more logical course and regard it as 
evidence that 7/ 0 is false. Our conclusion is stated formally as “H 0 : 2 P* = 0 
is rejected with an a of .05.” Of course, this implies that we are deciding in 
favor of the conclusion that & =£ Pi, i e., that the means for the two media 
(classroom lecture vs. programmed instruction) of the populations sampled 
are different. 

The F-test of the null hypothesis about the interaction of factors A and 
B is carried out in a similar manner. If H 0 : 2 2 X P1, = 0 is true, then F AO 
has an F-distribution with (/— l)(J — 1) = 1 and IJ(n — 1) = 44 degrees 
of freedom. Suppose that we wish to run a test of /f 0 with a risk of .01 of 
rejecting H 0 when it is actually true. The critical value that F AB must 
exceed to be counted as evidence of a false H Q is 7.25, the 99th percentile in 
the F-distribution with 1 and 44 degrees of freedom. 

The value oTF AB is 188.02/10,56 = 17.80. The position of this obtained 
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FIG. 17.15 Position or an /"-ratio of 17.80 relative to the /'-distribution 
with one and 44 degrees of freedom. 


/'-ratio relative to the /"-distribution with 1 and 44 df -~ the distribution F A0 
would follow if Hq were true — is illustrated in Fig. 17.15. 

By inspecting Fig. 17.15, it would appear to be foolish to regard 17.80 
as having been drawn at random from the distribution F 14l , but to argue that 
the data support the null hypothesis that 2 2 — 0 would be equivalent 

to doing so. Thus, the conclusion is to reject H 0 , and the probability that 
this conclusion is erroneous is far less than .01 . 

Since a significant interaction of A and B has been found, it will be 
illuminating to graph it. The four cell means for the data in Fig. 17.12 are 
6.00 for traditional method and classroom lecture, 13.08 for traditional 
method and programmed instruction, 13.58 for modern method and class- 
room lecture, and 28.58 for modern method and programmed instruction. 
The four cell means are depicted in Fig. 17.16. 

The departure from parallelism of the two lines in Fig. 17.16 reflects the 



™ ,£!? C " Ph °' ,h ' r “»" A and B f„ the da, I, 
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interaction of method with media. The departure of the two lines from 
parallelism was judged by the /'-test to be overwhelmingly statistically 
significant, i.e., there is hardly the slightest chance that graphing the unknown 
population means would reveal parallel lines. The interpretation of this 
interaction is that the modern teaching method emphasized the superiority 
of programmed instruction over the classroom lecture over what that 
superiority was for the traditional method. Programmed instruction is about 
seven points better than classroom lecture under the traditional method; 
but it is 15 points better with the modern method. Perhaps the modern 
method lent itself to programming better than did the traditional method. 


17.13 

REVIEW OF THE TWO-FACTOR 
ANOVA WITH EQUAL 
NUMBERS OF OBSERVATIONS 
PER CELL 


In outline form, the two-factor fixed-effects model ANOVA with n observe- 
tions per cell is performed as follows: 


1. The following model is postulated for the data: X 1ile n + «< + 

, ft iiainuned that the n observations in any one of the U cells come 

2 ' 1 formal distribution with variance o», which is constant for 

each cell and that the IJ samples are independent of one another. 

3 A level of significance a is adopted for each of the three /--tests: 
fhe lists of the null hypothesis for the mam effects of factor A and 
factor B and of the interaction effect of A and B. 

4 The three critical values for the /--rests are determined by reference 
U^Table E in Appendix A. These critical values are as follows. 

For F a : l-afi-LWin-i)! 

For F b : 

For F A b : 

4 The four mean squares, MS A , MS„, MS a b, and MS are cal- 
Jutted from , he computational formulas for sums of squares m 
Table 17.2 and the degrees of freedom. 

6. The three F-ratios are calculated: 

-—4. F MSb Fa 

Fa — ** 


7 . 


MS a 
" MS* 


= MS* 


_ ms ab 

MS* 


F ratios in step 6 are compared with the corresponding critical 
va^ found in step 4. If an F-ratio exceeds the critical value, the 
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corresponding null hypothesis is rejected at the .-level ot 
“”rKo docs pot escecd the critical value, the correspond, Pg 
null hypothesis is not rejected- 
8. The results are tabulated as 'illustrated in Table 17-4. 


TABLE 17.4 TABULATION OF RESULTS OF 
MODEL ANOVA 
Source of carfotion df 

Factor A (methods) 1 

Factor B (media) 1 

AxB 1 

Within ceils <4 


TWO-FACTOR FIXED -EFFECTS 


MS F P 


1598.52 Ml 34* P < && 

1463.02 138.54* p < -Ml 

5%8,02 VJ.44* P < #>l 

10.56 


• F-ratio significant at the * *= .01 Its el. 

Probability values appear in the last column of Table 17.4. It is 
becoming customary to report with each /‘-ratio the proportion, />, of ihe area 
in the /’-distribution that lies to the right of the obtained /’-ratio. For 
example, if F AB had been 3.47, one would have indicated that p was between 
.10 and .05, i.e., that less than 10% but more than 5% of the area in the 
distribution /' 1 , A4 lies above 3.47. One would know immediately that the null 
hypothesis about interaction could be rejected with an a of .10 but not with an 
a of .05. 


17.14 

TWO-FACTOR FIXEO-EFFECTR 
ANOVA WITH UNEQUAL n'» 


You will probably be surprised to learn that the generalization of the two- 
factor ANOVA from the equal n ' s case to that of unequal n's is not simple 
and straightforward as in the case of the one-factor ANOVA. Of the 
totality of two-factor layouts of data with unequal n’s a subclass of them 
presents no real difficulties in either theory or application. Unfortunately, 
the other class does present problems. The remainder of this section is 
divided into two parts corresponding to the two classes of conditions or! the 
unequal n’ s. 


Proportional Cell Frequencies 

Now that v/ e are considering the possibility or different celts having different 
nnmt?, « ">« adopt a « system sst denoting sssch 

” r " , ■, 3° ?“”!• ,lw common number or observations in the 1J cells 

ol , no-faetor des.gn ha, beer, denoted by h. HenreSrmh, the rremhe, of 
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Foctor A 

Foctor B (j) 


0) 

1 

2 


1 

n n~ 2 

n lZ = 4 

n u = 6 

2 

n 2l~ 3 

022 = 6 

n 2.~ 9 

3 

"3,-2 

'’32 = 4 

n 3 =» 6 

Column sums 

n,= 7 

°.Z~ '4 

n, =*21 


FIG. 17.17 Notation for factorial designs where the n„’s are not equal. 


observations in the ijth cell of such a design will be denoted by n (j . Consider 
the 3 X 2 design (/ - 3 , 3 — 2) for which the n’s are shown in Fig. 17.17. 

The number of observations in the cell at the intersection of the first 
column (/ * 1) and first row (; = 1) is n n = 2. There are n 3Z - 4 observa- 
tions in the cell at the intersection of column three (/ = 3) and row two 
(j = 2) 

It will be necessary eventually to make use of the numbers of observa- 
tions in any one row or column of a two-factor design. Notice in Fig. 17.17 
that there are 2 + 3 + 2 = 7 observations in the first column. This number 
is found by adding up n tj ’s as follows: 

3 

n.i = «u + n n + "si = 

The notation n ml for the total frequencies in the first column conforms to 
our usage of the subscripted dot to denote having summed over an index. 
In general, the total frequencies in the/th column is denoted by n.,. The total 
number of observations in row 3 is 

z 

«3. = «3l + n 32 —2^3 f = 2 + 4 = 6. 

In general, there are n„ observations in the ith row. 

The term n denotes % % rr if , which is the total number of observations 

j-i 

in the entire layout of data. We shall also use the familiar notation of N for 
the total number of observations, i.e., by definition = N. What is the 
meaning of n. 3 and »x.? 
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Relatively simple computational procedures may be applied to a two- 
factor fixed-effects ANOVA with unequal n's provided the cell frequencies 
are proportional. What does it mean to have proportional cell frequencies? 
The cell frequencies are proportional when 


"i#*-- 45 - 4 (17.14) 

for all IJ values of n u . 

In words, the condition of proportionality is satisfied when for each cell 
the number of observations in that cel! equals the product of the numbers of 
observat.ons in the entire row and entire column in which the cell ties, divided 

hv 1th ? ° bs ! rvalions ’ Notice that this condition is satisfied 

by each of the six values of n t , in Fig. 17.17; 


21 ’ 


21 


= 4, 


= 3, 


„ __ 9- 14 , 

1 “ “IT " 6 - 


_6- 14 „ 

21 


Thus the six cell frequencies in Fie 17 17 a™ , ,« n 

portional*' here means that 2:3:2: :4 : 6:4^and2*V a>r 3-6- r °2^r i n n «I,* r u°' 
are true because the first reduces to "2 is to 1 ’ **?’ B ° th ° f,hcSC 

and the second reduces to “1 U ,1 2 J , 3 . V° 2 ’ as 2 is to 3 « lo 2 > 
two sets of column “eouencies “ ’ 1 10 2 * 3S ! is lo 2 " Thus the 

column of row sums), and the three sels^f 2"? ‘ t0 ° lher (0r t0 the 
to each other (or to the row ofcoluml sunlsVt rrC, “ enC ’ 5S arc proportional 
Having determined that the cell frmt. 

proceed with the calculations as outHned hri* arc P ro P ortional . one may 
illustrated on the data (runnins Ion. d Nte calculations will be 

and girls or three races) in Fil 17 ®'J u " , i’ dl!,a " c ' measured in feet for boys 

symbol for thc'sum^Janor’th^tJT, If ‘ US lhat *« rc P ,aces » «« 

observations m the ,yth cell, i.e„ the s 



event 

(». In lfn r: 0pOn ‘ 0n ° f ,J w » casei 01 ,hc »- cases that lie 

t i ***** for ». EfSli vo'h J ’ ,hcn il «l ual 

colder bu ‘ 


consider each ai bem- r J' V when the «, •« «*’ h . 

ple ’ ,hat n »i *= r,e t = 3(1) = 3, 
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Foctor B (sex) 

Row 

sums 

Row 

means 

(race) 

Girls (1) 

Boys (2) 

Negro (1) 

11.5, 15.0 

182, It 3 

14 2, 15 9 

2 n V 

Z Z Xyjk 

;=i *=1 

=86.1 

14.4 

White (2) 

13.1, 10 4 

1 1.9 

14 3, 15 3 

1 1.8,11 0 
109.10.5 

2 "Z/ 

Z Z X 2 y* 

y=i *=t 

= 109.2 

12.1 

Orientol (3) 

10 1, 6 9 

12 8, 8 4 
10.6,10 0 

2 " 3; 

Z Z *3/» 

/*’ **1 

=58 8 

9.8 

Column 

sums 

3 Z 

Z I *m 

/=i *-i 

= 78.9 

3 "<2 

Z I */2* 

/si *«1 

= 175.2 

3 2 "‘I 

Z Z Z 

/= 1 y=t *=1 

=254 1 


Column 

means 

113 

12.5 


Grand mean 

12 1 


FIG. 17.18 Illustrative data for the proportional-cell-frequency design 
of Fig. 17.17. 


of the n„ observations in the i/th cell is denoted by 

2 Xu* 

aqua Stale two-factor ANOVA is the sum of the squares of the 21 obser- 
vations in Fig. 17.18: 


i i = u - 51 + 150= + ■ ■ • + 10t>! = 3216 - 27 - 

<«1 /-lt-i 
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FIG. 17.19 


The computational formulas for the four sums of squares bear a close 
resemblance to those for the equal n’s case. The four computational formulas 
appear in Table 17.5. 

TABLE 17J ~ 


ss A ~z 


q i £*,»)■ 

l-l N 

isss 


, , (2-0' 

»■** 1 
IS - -2 sin. - i 


«M will „ 0 „ * illustrattd with thc 

SS^-glig + OMjg , (S88 f (254,1). 

9 6 21 

= 3136.735-3074.610 = 62.125. 

SS„ = <Z8 ; V + 075 : W_ ( 254. I ) I 

1 14 J] = 3081.819 — 3074.610 = 7.209. 

SS AD = ffill + OW ( 73 ^ (17.0) 1 

4 3 n Z — i 

J 6 y 
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TABLE 17 


+ <i!i)!_ 62.125 - 7.209 

4 21 

= 3145.935 — 62.125 — 7.209 — 3074.610 = 1.991. 

r(26.5)* , (41.8)*1 

SS V = 3216.270 - + • ■ • + —j — J 

= 3216.270 — 3145.935 = 70.335. 

The degrees of freedom and expected values of the four mean squares 
for the two-factor ANOVA with unequal n’s are given in Table 17.6. The 

.6 DEGREES OF FREEDOM AND EXPECTED VALUES OF MEAN SQUARES IN 
THE TWO-FACTOR ANOVA WITH UNEQUAL ns 

- Af E(MS) 

Source of variation a J 


2. n ‘ a < 


A x B 
Within cells 


(/ - 1)(/ - 1) 
N-IJ 


° ;+ /=T 

2 

, <= i t=i 

' + (l-lHJ-l) 


obtained values of the four mean squares are as follows: 

62.125 


MSjb ~ 


1.991 


=-= 31.062, 
= 0.996, 


MS b = — p = 7.209, 


ms " =z ^F = 4 ' 689 ' 


. • Table 17 6 possess only minor differences from the.r 
The entries m Table "^ OVA _ with the unequal ns ANOVA. we 
counterparts in llK ? q , ithin ce ll s as N-IJ instead of TJ(n - I)t 

write the degrees of freedom, w^ m _ a=H _ ,j 

notice, however, tha l h al „. s case on ]y j„ that each 

The Tf maln or int emedon-i* differentially weighted by the 
numbet rfdte^lions available to estimate it in the layout of data. n„ IS 
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the general case, of which 


n i = n. * = . . . = 


(i.c., equal column frequencies, equal row frequencies, or equal frequencies 
for each of the // factor-level combinations) are special cases that permit 
the “weights” (n,., n. } , or rt u ) to be moved to the left of one or both of the 
summation signs. Note, for example, that in Table 17.6 if the n,.*s are all 
equal, say, to c, the E{AfS) for factor A becomes 

' 2*1 


Inspection of the E(AfS )' s in Table 17.6 indicates appropriate /'-tests 
of the three null hypotheses: 


= if # : i# = 0. 


0. 


(Notice that if S ej - 0, then 2n,.«' is also equal to zero since if the lint 
Mndttion holds then all I «,'s are zero. The same is true for tbe ft's and 

If if,: 2 «« = 0 is true, then MS A = SSJ( I - |) win be an unbiased 
Ss7T R '?. rdl 'j s of Hi is true or false, MS. = 

S'JJ; ” i*T, d ,r'"r r ° r c ' and is i"fc|*ndent of MS A . 

,f , A . S *I MS » W| U follow the /■-distribution with I — 1 and N — 1J 

31.062.4.689 = 6 62 Te had t,'“ n,PlC ' f “ U “ *’“> Fi S' 17-18 ft, = 

4.689 = 1.54. Thus the “ 8 ' 68 ' The va,ue °f . f b « 7-209 / 

There is insufficient evidence SUpport ejection of //„: £ ft — 0. 

or this a E e differ in iong-jumping'Sy ^ K " K '“ Si °“ «iat boys and girls 

The critical value for F if e , 

4.689 = 0.21. Thus the 1^5 d'” F! ' 11-6 ' 36 ’ T 1 ' value of ft,„ is 0.996/ 
of no interaction or race and sel " 0t ‘“’’P”'* "S' 0 "™ of the hypothesis 
The results may be tabulated as in Table 17.7. 
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TABLE 17.7 RESULTS OF THE TWO-FACTOR ANOVA OF THE DATA IN FIGURE 17.12 


Source of variation 

•V 

MS 

F 

P 

Race (A) 

2 

31.062 

6.62 

p <0 01 

Sex (B) 

1 

7.209 

1.54 

p > 0.25 

Race x sex (A x B) 

2 

0.996 

0.21 

p > 0.50 

Within race-sex groups 

15 

4.689 




Disproportions! Cel! Frequencies 

The numbers of observations in the cells of the two-factor layout shown in 
Fig. 1 7.20 are disproportional. Notice, for example, that n u is not equal to 

«i.«.i /«., : 


5 ^= 4 ' 4 ' 

Though it is by no means readily apparent, proportional cell frequencies 
can be achieved in the layout of Fig. 17.20 by the simple expedient of 
randomly discarding one of the five scores in the cell in row 1 and column 1 
and one of the five scores in the cell in row 2 and column 3. After so doing, 
the cell frequencies are proportional (see Fig. 17.21). (You may check the 
cell frequencies of Fig. 17.21 against Eq. (17.14) to satisfy yourself that they 
are actually proportional.) 

When proportional cell frequencies can be achieved from dispropor- 
tional frequencies by randomly discarding only a few observations from the 
total layout, then by all means one should do so. Avoiding the computational 

Factor B (/') 


FIG. 17.20 



Factor B l j) 


FIG. 17.21 
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labor necessary for the analysis of disproportional designs is worth the trivial 
reduction in power resulting from discarding 5%, say, of the data. When 
proportional cell frequencies cannot be easily or inexpensively achieved (in 
terms of power of the F-test) by randomly discarding data from various cells, 
the analysis that will now be presented is recommended. 

The unweighted means analysis is probably the simplest and one of the 
most justtliable techniques for analyzing disproportional designs. First 
we shall outline bneily the rationale for the unweighted means analysis; 
then this technique will be illustrated. 

nortin“5”5t * hat data are gathered in a two-factor design with dispro- 
anv one cell h rct ^ encles ‘ Ncxt> consider replacing the n u observations in 
we have 2 observation the mean of those n„ scores. Now 

tion per cell the^^ ° ^ cc,,s in a two-factor design with one observa- 

wut of d 1 h,s 0n8inaI "" ScorES in ,h ' «"• Surely this new 

to y I stss b e r ° p rr' “"/ccflucoc'es; indeed, the n's are all equal 

toa. and “=■» « this — ' a yo“‘ o f 

individual observations ) Hc^rer^he',' "“i baied °" thE ori8 '" al 

cells when n ■= 1 • hence we can™* ’ V”? 8X181 no de 8 rees °f freedom within 
of cell means. * ’ t ca Iculate a MS' u from the two-factor array 

variance of the normal'p'oDu'lathf UC f° f MS £ Any MS '“ must estimate the 
tions) in any one of the fj cells Is 'V**® ° bservation < or observa- 

observations in the yth cell come ? ,nCe We . assume that the n <> 

the variance of A 1 .. i 5 a zi„ K „ normal population with variance o!, 
°5/»« that the population variant Wc '^ no ^ n theorem. We can see from 
(since n (l differs from cell to cell) How* C ^ ? 11 mcans arc heterogeneous 
in the case of the layout of //cell CVer ‘ Whe ” n & are cc l uaI ( as they are 
F-tests operate as they would if all 0 f thev ”• ll haS 1,60,1 shown that the 
average or the // cell variances (see uST"' ec l ual and equalled the 
can find an estimate of the a «ra e TvS ' ??• cha P- 10 )- Hence, if we 
it as fte denominator of any F-rftio l3nCC ° f thC * J Wl1 mcans , we can use 


which equals 


r j t 

average variance of X i{ = * > 


w e cap estimate „• >h, ■ U 

observations. I, ^ the original 


. , — m U1C original n (l 

mean square within for the anginal 
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layout of N individual observations. Hence, we take 


MS' W = MS, 


11 7 

* * n tt 


(17.15) 


Denote the multiplier of MS. in Eq. (17.15) bye; e is the average of the 
reciprocals of the numbers of observations in the U cells. MS. appro- 
priate as a denominator for testing MS' A , MS n , and 

The unweighted means analysis is illustrated on data in Rg 7.22 
These hypothetical data can be thought of as having been gathered in an 

Binet Intelligence Test. „ are lra „ s formed into the layout of 

The data in the top half of Fig. 17.22 are trans . _ 

ha,r ° f Rs - ,7 - 22 is now resarded 
1 . original Dole Evidencing Disproportionol Cell Frequencies. 

Foctor 0 - Glutomic ocid dosage 


2 

Hclf 


3 

Full 

Dosoge 


1 

Normal 


iE 6 Brain 

damaged 


92, 114. 
107 

nil = 3 

101. 118, 
98, 96, 
105 
n| 2 -5 

89, 120. 

1 10, 115 

r>i3**4 

91, 74, 
65, 90 
n 2 i = 4 

101, 68, 

59 

022 = 3 

79, 88, 

55, 67, 93 
n 2 3 = 5 


2. Layout of Cell Means. 

Factor B 


104 33 

103-60 

108.50 

80.00 

76.00 

76.40 


Column totals 
Column meons 


184.33 

92.17 


89.80 92-45 


Row totals 
316.43 
232.40 
548.83 


Row meons 
105.48 


meons . . 

fig 17.22 Layout of data for the unweighted means analysis. 




THE TWO-FACTOR ANALYSIS OF VARIANCE — FIXFD FFFECTS 


CHAP. 17 


as a two-factor ANOVA with n = l observation per cell. The sums of 
squares are calculated as follows: 


. «(i 

;s — t v - 1 ' w-i i.i / 
tx 3 6 


= (316.43)* + (232.40)* _ (548.83)* 


ss' b =2 


51,379.235 — 50,202.395 = 1176 840. 

(14 (||4 


2 6 
«_ (184.33)* + (179.60)* -f (184.901* (548.83)* 

2 ~ 
= 50,210.860 - 50,201395 = 8.465. 


-(I,?, 4 


ss 'ab = 22 K - SS U - ss'„ 

1-1 0 

= (104.33)* + (103.60)* + . . . + (76.40)* 

- 1176.840 - 8.465 - ( ** 8 ’ 83 >* 

6 

“ 5M02 - 919 - - 8.465 - 50.202.395 = 15 219 

^- 222 ^- 11(14 

1 * ' J ” a 

= 92- -f H4*+ 107* + ... + 93 i _ j"3l£ + + — 1 

“ M? .0« - 205,522.933 « 3538.067. 

The value of c is gi\en by 

r j , 

21 — 

c = -L JJhi l±±±i+$ + } + i 

lJ 6 ' = 0.2611. 

Now we arc prepare, ,o mcan squares and ^ 
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, SS'j 

ms = — 

1 


AfS» = 


MS’ a j 


SSjj 

2 

SS' Al 

2 


1176.840 

1 

8.465 

2 

15.219 

" 2 


1176.840. 


4.233. 


= 7.610. 


ms; _ C ^ = 0.2611 = 0.2611(196.559) = 51.322 


18 18 
The f-ratios for testing the three null hypotheses are: 

MS'. _ 1176.840 - ms 'b 4 ' 233 

Fa ~ MS', ~ 51.322 


= 22.93. 


= MS' 51.322 


= 0.08. 


MS' jb _ M£„ni5 
Fa “ ~ MS', 51.322 

Quite obviously there exists no evidence to conclude a significant main 
effect for factor ^ 

probttbihty/’ of oWaining^nF^as large as 22.93 or larger if the null hypoth- 
esis for factor A is true is less than .01. Hence, ft: Z < = « " ^ 
The results of the unweighted means analysis can be presented as ts 
usual for a two-factor ANOVA: 


Source of 
variation 


Diagnosis (A) 
Glutamic acid 
dosage ( B ) 
A X B 
Within cells 


MS F P 



1176.840 

22.93 

p <.01 

2 

4.233 

0.08 

p > .50 

2 

7.610 

0.15 

p> .50 

18 

51.322 




MULTIPLE COMPARISONS IN 
THE TWO-FACTOR ANOVA 


_ 11 Rvnnthesis about the main effect of factor A implies 

The rejection of a null ^ means differ . AU , population 

only that at least * may differ, or any number in between may 

means may difier, omy 
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differ; the /"-test does not distinguish among these possibilities. As was true 
with the one-factor ANOVA, multiple-comparisons procedures are required 
to determine which of the pairs of / sample means show differences large 
enough to permit the conclusion that the underlying population means 
differ. The above remarks apply to factor B as well. The Tukey and 
Scheffe methods of multiple comparisons were presented in Chapter 16. In 
this section, we shall indicate how each method would typically be employed 
in the two-factor case. 


™. W f'V hC "“™ b ' r of ° b “rvalions upon which each of the / sample 
" A “‘“1“ '1“ al tor all / means and when one is interested 
method h aT V b,lr ' r ' n “ s bct ' v ' e " 1 population means, the Tukey 
” “ tie * Th = Se ‘ ° f m simultaneous conf.deni 
as Wlows: * “ n * den “ c «moiem 1 - a is constructed 




(17.16) 


v ' here '-rfr.v-H u the 100(1 - «) percentile point in the Studentized range 

fitter N ~" ^ ° r C« 

zi2z m tr in j'z' i,c ,w °- r ™°' an ° va - »" 

the I samnte • °c observatlon s on which each of 
ZtZ b “f d - (R ' ra " ,hat "“‘"ber of 

method ) Wh Z e - q “ a r “ ' ach g “ >u >’ to “ sc lhtr Tukey 
method.) When the n s in the //cells are equal, N/I^nJ. 

method win bThtalrated o a^thedaf' 0 ” 5 ?" 11 ' 1 '” 0 ' in,erva,s by the Tukey 
P-ved be significant**! I'L ]72i « ba rating 

Notice that 77 = 27, /= 2 _ 

observations upon which each cohm? = .* ^ urt brrmorc, the number of 

each column mean is based equals N/J - 27/3 = 9. 


>i L= 24 00 

«u = 3 

>(2.= 17.33 

"12 = 3 

3 

>H=I6.33 

"is = 3 

>21=26.17 

n 2t=6 

>.t= 25.44 

>22.= 15.34 

"22 = 6 

>23= 1 8 00 

"23 = 6 




sec. 17.15 


MULTIPLE COMPARISONS IN THE TWO-FACTOR ANOVA 445 


The value of AfS a is 232.78; hence, F n = 232.78/24.62 = 9.45, which is 
significant at the .01 level with 2 and 21 degrees of freedom. The questions 
remain whether group 1 differs significantly from both groups 2 and 3 or 
whether groups 2 and 3 are significantly different. 

A set of simultaneous confidence intervals may be constructed around 
the three differences between pairs of means by the method outlined in 
Table 17.7. 


TABLE 17.7 CONSTRUCTION OF A SET OF SIMULTANEOUS CONFIDENCE INTERVALS 
AROUND THE DIFFERENCES BETWEEN PAIRS OF MEANS BY THE TUKEY 
METHOD (/= 3, a = .01) 


Differences between means 

jMS„ 

**- n y]W 

Confidence interval 
[Eq. 07.16)1 

X.u - X.t. - 9.44 

(4.6!) / ^ - 7. 62 

(1.82, 17.06) 

X A . - X A . = 8.00 

^7.62 

(0.38, 15.62) 

X.r. ~ X.*. = -1.44 

7.62 

(-9.06, 6.18) 


The confidence intervals on fi A — fi. 2 and fi.i — do not include zero; 
hence, we conclude a significant difference between X A , and and between 
JP.!. and . The confidence interval on fi A and includes zero; there 
exists no evidence that p. A and /2. 3 differ. The confidence afforded by the 
Tukey method is 99% that all three of these conclusions are simultaneously 
correct. 

The above techniques apply to factor A as well, of course, the only 
alterations being in the use of i-jjjjf-ij and N}J as a divisor of A/S„. 

When one desires to perform multiple comparisons on the means of one 
factor in a two-factor design in which these row or column means are based 
on differing numbers of observations, the Scheffe method should be employed. 
The Scheffe method was discussed at length in Chapter 16. The only alter- 
ations to Eq. (16.5) occasioned by the fact that a two-factor ANOVA is being 
analyzed are that MS* now stands for “within cells mean square” and 

rt u .. . ,nj now become N/., i.e., the n ' s are the total numbers of 

observations upon which the column (or row) means are based. The formula 
for constructing a set of simultaneous confidence intervals around contrasts 
of the row means by the Scheffe method is as follows: 


(c t JP lmm + . • - + CiXiJ ± \/ W5 “'(f L + • • * + 


(17.17) 



PROBLEMS AND EXERCISES 


1. Complete the ANOVA table and make /"-tests of the null hypotheses for factors 
A and B and the interaction of A and B at the a = .01 level of significance. 

Source of variation df SS MS 

Factor A 4 64.26 

Factor B 5 46 .85 

A X B 

Within 120 1136.53 

Total 149 2411.69 

2. The figure below represents eell, row, and column population means in a 

two-factor ANOVA design. 


Factor B 



-■ticth^ol ^^ J ,,n^ ^- ^7; il< ; i"'e"ctiot. between “sex” and 
method or instruction is +6 points on a F Thc . ma,n cfTect °f ‘he aural-oral 
main effect for females is +2 points on the. rtnch Ian g ua ge mastery test ; the 
ant? y h ? W man y P° in[ 5 should the m ,ttt " In what direc,ion 

r”” 1 onflo graphy , ni ,|„ Teaching 



PROBLEMS AND EXERCISES 447 


T.O. I.T.A. 



5. For each of the Following arrangements of data, graph the interaction of factor 
B, sex, with factor A, “treatments.” Place factor B on the abscissa so that each 
treatment level yields one line in the graph. Which cases show ordinar interaction 
(all questions of statistical significance aside)? Which show disordinal inter- 
action? 




6 Graph the interaction of “sex” and “treatments” for the data under (b) of 
" Prob 5. Place “treatments” on the abscissa. Is the interaction ordinal or 
disordinal? Was the interaction ordinal or disordinal when “sex” was placed 
on the abscissa of the graph? 
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7. A researcher is studying the effects on learning of insetting questions into 
instructional materials. There is some doubt whether these questions would be 
more efFective preceding or following the passage about which the question is 
posed. In addition, the researcher wonders if (he effect of the position of the 
questions is the same for factual questions and for questions that require the 
learner to compose a thoughtful and original response. A group of 24 students 
is split at random into four groups of six students each. One group is assigned 
to each of the four combinations of factor/?, “position of question (before vs. 
after the passage)" and factor A “type of question (factual vs. thought- 
provoking).” After 10 hours of studying under these conditions, the 24 students 
are given a 50-item test over the content of the instructional materials. The 
following test scores are obtained: 


•§ fsc' 

| 

t 

“ Thought 


Perform a two-factor ANOVA on the above data. Test the null hypotheses 
for both main effects and the interaction effects at the .10 level of significance. 

8. Analyze the following scores on a 50-itcm vocabulary test administered to 24 
students of high and average intelligence (factor A) after one year of studying 
a foreign language under one of three methods (factor B): 

Foctor B 


Position ot question 
Before After 


19 

23 

31 

28 

29 

26 

26 

27 

30 

17 

35 

32 

27 

21 

36 

29 

20 

26 

39 

31 

15 

24 

41 

35 


Aurol-orol Translation Combined 
method method methods 
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a. Perform F-tests of the null hypotheses for rows, columns, and interaction 
at the .05 level. 

b. These same data were analyzed under Prob. 8 at the end of Chapter IS. 
However, there the data were regarded as a one-factor ANOVA, “method 
of instruction” being the only factor. Verify that SS B and MS B are the 
same whether the data are considered in a one-factor or two-factor design. 
Also verify that SS W for the one-factor ANOVA of the data in Prob. 8 of 
Chapter 15 equals SS A + SS 4B + SS W for the two-factor ANOVA of the 
same data. 

9. D. L. Williams (1968) experimented with rewriting sixth-grade science materials 
so that the reading difficulty was appropriate for the third grade. (The 
following data are based on his study.) A group of 240 sixth-grade pupils were 
randomly assigned to one or the other of two levels of reading difficulty: 
“Grade 6 reading difficulty” or “Grade 3 reading difficulty." For three days, 
pupils read "Resources of the Sea,” a chapter in a science text, at either the 
third-grade or sixth-grade reading difficulty level. A 129-item multiple-choice 
test was administered at the end of the experiment to assess comprehension. 
Within both levels of reading difficulty, pupils were classified as either high, 
average, or low scorers on the reading section of the Stanford Achievement 
Test. The following data were obtained for the two levels of reading difficulty 
and three levels of reading achievement: 


Reading difficulty 
Grade 6 Grade 3 



0*40 

0 = 40 

_ High 

X u =89.93 

X a =93.89 

1 

5|| — 12.02 

s l2 = 13.02 


o = 4 0 

o=40 

-fj Average 

» 70.79 

72.55 


s 2| = 14.76 

s 22 = I5.90 


o = 40 

o 

11. 

Low 

— 52.09 

*,2 =56.64 

S 3 , = 11.30 

$32=12.21 


Perform a two-factor ANOVA on these data. Test all three null hypotheses 
at the .05 level of significance. (Hint: MS, is the average of the si* within-cell 
variances?) 

JO All of the first-grade pupils in a school were given a reading readiness test in 
* September. According to the test manual, a total of 12 girls and 18 boys had 
scores so low that “formal reading instruction should be postponed and ‘reading 
readiness activities’ should be substituted.” A reading researcher randomly 
assigned the 12 girls and 18 boys in equal numbers to one of the following three 
conditions: (1) give 18 weeks of reading readiness activities before beginning 
instruction; (2) give 9 weeks of readiness activities; (3) commence reading 
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instruction immediately (0 weeks or readiness activities). A reading achieve- 
ment test was administered to all 30 pupils at the end of the third grade. The 
following grade-placement scores were obtained: 


Boys 

Sex 

Girls 


Reodtnq readiness activities 


18 weeks 9 weeks None 


34 

4.0 

3 7 

4.3 

4,1 

4.4 

4. 1 

3.8 

4.2 

3.3 

3.7 

3.2 

3.9 

4.4 

3.8 

3.1 

3 4 

4 0 

4.0 

3 8 

4 6 

4.2 

4 4 

4.3 

A3 

4.7 

3.9 

3.0 

4,0 

4.6 


Perform /"-tests at the .05 level of the null hypotheses of no difference among 
amounts of readiness activities, no differences between the sexes, and no 
interaction between the two factors. 

11. In which of the following designs are the cell frequencies proportional, i.e., 
in which instances are n„ - (n, n j)|nj 


(ol 

1 

2 

3 

(bl 

1 

2 


n,,= 4 

0,2 = 8 

«,j = 2 

1 

"u = 5 1 

"12 = 6 

2 l 

"21=6 

"22= 12 

"23=3 

21 

"2, = 10 

"22= 14 

(c) 

1 

2 

3 

(d) 

1 

2 

1 1 


0,2= 10 

«is- is! 

t ! 

A„-6 

n ,2 = to 

2 

"21 = 4 

"22=8 

"23=12 

2 

"21= 3 

"22= 8 

3 

| »*=-z 

"S2=4 

"33= 6 

3 

"11= 4 

"32=8 


1Z. Prom which one celt could two observations be discarded to achieve pro- 
portional cell frequencies: 
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13. A sample of 40 tenth-grade pupils drawn randomly from either public or 
parochial schools (factor B ) can be classified as of either high, middle, or low 
socio-economic status (factor A). The pupils earned the following grade- 
placement scores on a mathematics achievement test: 


Type of school 


Pubfic Parochial 



9 5 

10 1 

10 4 

8.5 

High 

8 7 

10 4 

II 6 

10.3 




9.3 

10.2 


8.4 

n .4 

10 4 

II 1 


10 5 

to 6 

9.4 

10 3 

Middle 

9 8 

10.4 

10 6 

10 6 


10 6 


II .0 

10.7 


8.6 

8.9 

10.0 

9 9 


7.3 

9 7 

9 5 

10.6 

Low 

10 2 

10.0 

8 9 

10 4 


9 5 

7 I 




9 8 



1 


Because the cell frequencies are disproportional, perform an unweighted-means 
analysis of variance to test the null hypotheses for rows, columns, and inter- 
action. 
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ONE-FACTOR AND MULTI- 
FACTOR ANALYSIS OF 
VARIANCE: RANDOM, MIXED, 
AND FIXED EFFECTS 


18.1 

INTRODUCTION 


f var,a "°= "“del that o„de,|i e |‘}w^““ “ al " rnativ ' >° the snalysi 
combination „r,hi s new modelandtTT. ! 5 ‘ hri,u 8 h (2) to show i 
” ' 7 ™ ,C< i- experimental research ’ AN0VA «-« is o 

a nes oT'"’ 81 of sqnare. Pr "' M n ‘‘ es for 

values of mean squares, e.tr. \ „r * st l uar es, degrees of freedom. 



18.2 

THE RANDOM-EFFECTS 
anova model 


In this section we shall ’ 

nunzitly t many similarities between 
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the techniques of Chapter IS and those to be presented here should sub- 
stantially facilitate learning the material in this chapter. The model which we 
shall now develop is called the random-effects analysis of variance model 
The reasons for having spoken of a “fixed-effects" ANOVA model in 
Chapter 15 were never made clear. It is simpler to comprehend the meaning 
or “fixed effects” when they can be contrasted with “random effects. In the 
ANOVA model of Chapter 15, we were primarily interested in making 

statistical inferences about the set of main effects y. The inference 

to be made was from a set of sample data (J groups of n persons) to i / 
populations. We conceived of an infinite sequence »f replications of the 

Oim interest'in the^f popul^Uon^nwamf— or equivalently the e ^ ects “ 

l“o va^Ae/ populations so that 

be sampled. This P r0 “ d “ r ' “ |kction 0 r populations only a few of which 
variance of the means of a g ( Wo might> th en, randomly select 
could be observed m ° h P lcol v populations from a large collection 
for a given experiment bo* „ 0 P b £rvations from ea ch population, 

of populations and then s P ^ wou|d nol "fix” the factor levels so that 

If we ran the experiment g . resented; rather we would randomly 

the same J were “effects” of the fonn 

sample a diffMent Mt f /p ^ mode| so , hen! are effects 0 f the form 
“< = — ** % Lw model. However, whereas formerly the complete 

“i = f' ~ A for cvcry replication of the fixed-effects analysis, now 

set of «,s was present cas 7 of having only a random sample of the o, 

we want to consider u experiment. Hence, the name random- 

effects present in a r 'P 1,c t. , g , is an attempt to capture graphically this 
effects ANOVA model Figure^ ^ random . effKts moddl . 
distinction betwee"* 0 a school system administers the Metro- 
As part of its testing : P ?th> 8fc and 9th grades of all junior high 
politan Achievement b often these data are used to make corn- 

schools in the system showing higher performance than school B1 Has 
parisons. Is scho ^ , Th e administrators realize that not all 

school C improved since r(Jiz!d In particular, the tests are given at 

aspects of the testing different schools. If there is some substantial 

different times ot me J associated with the time of day at administra- 
variability in test pert - SOI1 of schools A and B might be telling more 

tion of the test, tnen r 
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something about all the ft* The variance of the ft's tells us what sort of 
differences to expect in the ft's, the mean test scores for the population of 
students across all 360 testing periods. . 

We shall now formalize the random-effects model. Suppose we have a 
virtually infinite population of ft* (Again, we are not worried , because m 
our example we have only 360 ft's.) Denote the average rfd ta ft s by 
ft The ith student earns a score of X„ dunng theyth , «bng penod. The 
difference between the ith student's score at period; and the mean 
students' scores in that period is denoted by e„; hence, 

*„ = ft + e„, <‘ 8 ' 1) 

'jSsss.-aass- 1 "*”" 

The mode, in Eq. (18.1, will be altered inferm slightly by deviating the 
ft’s around ft the mean of all the ft , 

X u -ti + (ft -(*> + '«' <18 ' 2) 

. , . , 18 j, follows by adding in and subtracting out ft 

from?,' as 0 'a notation simplification of E,- (18.2) will put the random- 
effects AKOvi model into its customary form: 

,V„ - ft + a, + 'a, 

Whcre ft = ft-p and e„-X„-ft. 

. test score of all students over all 360 testing 

Suppose that ft, the m tudenls average 4 points above 30 when 

periods, is 30; and suppose that at^ period (9:3} _ W:25 a . m . ); and 

they take the test b“'”8 th 3 u 6 ‘ en , scor ° s 8 points hetoiv the average of all 

finally suppose that the four t^^ mode , in Eq . (18.3) holds, what test 

score ’will' the fourtl^'student obtain during the 36, h testing period, 

30 + 4 “ 8 .. 
interested in the variance of the n, s. How has this 
Initially, we were int havc been » e ]j m i na ted” from the model? 

concern changed now a ' tant> ^ the variance of the fi/s is the same 
Since a, is simply ft, mm ^ differences in test scores due to the time of 

as the variance of the a, s. » be reflecte d in the variance of the a/s, 

day of administration of the 

i.e., m ^ we can pf0Ceed to estimate a* and make statistical inferential 
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statements about it, it will be necessary to make some assumptions about the 
model in Eq. (18.3): 


1. The random-effects a t are normally distributed with a mean of zero 
and a variance of a\, i.e., a ~ JV(0, crj). 

2. The error component e„ is normally distributed with mean zero and 
variance o', i.e., e ~ N(0, o'). 

3. The random-effects a , and the error components e if are independent, 
from which it follows that Pa , = 0. Furthermore, the e {l are all 
independent of one another. 


These assumptions have several implications for the testing study. If 
study.then^ CtS " 1 “ 1 ' 1 “ ” adt ‘>“ a,e **ription of the data in the testing 


’■ S'in^me^dmi^K'^T °, f 4 s - tud “«’ Korea for all possible 
e, should have a normal distribution arou'nd aero with » ™ri.M of 

3 . rz" ri r £s 5K2£ ,hc homo ' 

for a randomly desimated^ JUm? MriS”” P°P ulations of scores 
no information about whether * hc va >uc ofo, would give 

similarly, the value of e would o : ‘ ** . a k° ve or ^' ow zero and 

a different i was ab(> ^ or tJ?i 0 6 "' ™ '" r °'™tion about whether 

estimating bSh 't’ h ' " ch “ i< l u “ f°r actually 

esttmating the variability in f r0 “' n ? *° he solved is one of 

£ ”5- and <•* variance in test scores duel 'm"”' ° f admil >'«ering the 
"oata ,CSl 81 53,716 t,me ’ °** ° dl ^ Crences amon g students 

randonveffects* tarn « 1 ? ' !nS P ' an - Firs «- 1 levels of the 

fenods from the population or 360 MrS"' F ° r “ am P'=- ■> - 5 time 
"” d ° m sampling of persons f ro m oon^t", art etam. Just as 

population of persons. nnita . WAhnn pemiu generalizations to the 
drawn r P o P u,a . i0n of levels. Q 7 , h '“"^ oranowsgeneratiza- 
drawn from , he of n oblations is 

the /levels. In our eeample, this 
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procedure is simulated by administering the Metropolitan ^ ea ^ n ^ J^ St . 
each of the five time period, to n = 7 student, random^ drawn from the 
population of all students. Residing then, to be T'r’-S 

convenience here: the method !t of this section have never been fully developed 

for the unequal ns case. . , = 35 The scores on 

In our study, we have sampled /-5ns and . / 
the 44-item reading test may be arranged as in Table usa. 

TABLE la.l DATA FROM A STUDY OF THE EFFECTS OF TESTING TIME ON TEST SCORES 


Telling lime perloJ 


(11:09-11:59 » m.) (10:14-11 •« • 



« ^uld come a, no great 

NoVshouid'it be unexpected that the variance of the X„S within each 

produces a sample variance t . e j Th e average of these J sample 

population of scores for they f or p or the fixed-effects model, 

variances is the best estima or calIed the mean-square within: 

the average within-sample van 

, V t (X„ - X.i) 

*? + . . . + s l j . ( 18 - 4 ) 

MS* = j J(n - 1) 


US. is an estimator of in the following sense,: 

>• “£“!> an "unbiased ^estimator the variance of scores 

within each factor level. wj(h ^ |eycls and „ 0 bser- 

2 - va V tSns e Sreach level, the sampling distribution of MS. ,s 
given by: , 
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j.e., the sampling distribution of M5 k is times a chi-square 
distribution with J[n - 1) degrees of freedom divided by J(n — 1)- 

U is simple to prove that the expected value of MS. is a 1 if one remembers 
that the expected value of is simply n: 


£(MSJ = 


[ Vfou-ii l _ 

U(n - 1)J " 


An - J) 


[J(n - 1)1 = <r • 


As in Chapter 15, the definitional formula for A fS. In Eq. (18.4) is not 
convenient for computations. In its place, one should use the following 
formula to compute MS.: 


j « 

ii*:, 


i (14 


MS. = tl i . (1 8.5) 

J{n - 1) 


The application of Eq. (18.5) to the data in Table 18.1 will be illustrated 
in Table 18.2. 


TABLE 18.2 ILLUSTRATION Of CALCULATION OF MS, AND MS- ON THE DATA 
IN TABLE 18 I 

Testing time period 

1 2 3 4 J 


.-i Ar, ‘^ 25 ' 2*., =*183 2r„«2ii £jr„~207 

2 « - ’176 ilS - 10.145 £ : X\ =.4981 £** - 68tJ J *«, = €349 

V ^ JT„ «• Ills 2 - 37,384 

M|.l 

SS, - —l* ’ ^ f265> ‘ ^ {183 >* 4 ffllp C07p _ tl 1181* = 1S4J88 _ 1,249,924 


” - JJ.IU-tt - €57 60 


55. - J7JS4 _ fi* 2 / A- C«f - 1»1)» 4- (2H)' , ixfif 


57JM - 36,3*9.71 


S/S - - 5S - 10I4J29 

JiTTT, 35 — M -* 1 
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The function MS ht mean-square between groups, with which you became 
familiar in Chapter 15 was there seen to be closely related to the variance of / 
sample means. Having defined MS b before, we can use it now in estimating 
the variance of the population of a’s or, what is equivalent, the variance 
of the population of /i/s. You will recall that 

(*,-*)* 

MS » = ~~J_ i ’ ( 18 - 6 ) 

i.e., A fS b is n times the sample variance of the J sample means, X , lt .... X j. 

Again, the definitional formula for a mean square makes for clumsy 
calculation. A more efficient computational formula for MS b is: 

( 1 . 1.4 

MS, = - - 7 — ' • (1S.7) 

It can be shown — though we shall not do so here—that the expected 
value of MS b has the following form: 

E(AfS„) = <r 2 + nol, (18.8) 

i.e., on the average — the average across an infinite collection of independent 
replications of the same study with /randomly chosen levels with n randomly 
chosen observations at each level — MS b will equal c z , the same variance that 
is estimated by MS W , plus n times the variance of the population of a/s 
(which is the same as the variance of the population of fi/s). 

Thus we see that A fS b estimates all that A fS v estimates and more. 
The something “more” is n times the quantity of interest, o' 2 . 

It happens that the sampling distribution of MS b is given by the following: 

MS,~(a‘ + no ! J^= ( 18 . 9 ) 

i.e., over repeated samplings of / X n layouts of data in which both factor 
levels and observations within levels are randomly sampled, the sampling 
distribution of MS b is that of chi-square with J - 1 degrees of freedom 
multiplied by the constant o 2 na\ over J 1 - Unlike the fixed-effects 
model, MS b has a sampling distribution that is a constant times the chi- 
square distribution even when there are differences among the <u/s. 

In Table 18.2 the calculation of MS b and MS„ is illustrated using the 
data in Table 18.1. We see in Table 18.2 that the best estimate we have of 
a 2 , the variance of Metropolitan Reading Test scores in the population of 
junior-high-school students, is MS„, which equals 33.81. We shall denote 
this estimate by o\ the “ A ” indicating an estimate of the parameter below it. 
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The mean of all five groups is apptoj.malely 32; we would ‘h™ «P' ct 
Metropolitan Reading Test scores for junior-high-school undent, would be 
normally distributed around a mean of 32 with a standard deviation of 

aPP, ThTvthled5si is an estimate of o' + no’. We can obtain an estimate 
of al alone from MS, and MS. in the following manner; 


^MS, 


5, - MS«.~ j 


E(MS t ) - £(MS,) o* ± ncft - o* _ naj 


The best estimate we can attain or o\ is given by (A/S* — AfSJ/n. This 
function estimates al unbiasedly. If MS W were larger than MS„ which 
could happen— the estimate d\ of a\ would be negative. Wc know, of course, 
that o* is positive, or at the very least, zero. Thus, if ever a negative estimate 
of a\ were obtained, it would automatically be set equal to zero instead. 
For the data in Table 18.1 , we have 


,, MS* — MS* 164.40- 33.81 

a. ItS.oo. 

n 7 

Thus the best estimate, 6J, of the variance of the means of populations 
of test scores obtained at each of the 360 possible testing periods during the 
day is 18.66. The standard deviation of these population means is approxi- 
mately 4.32. 

This estimate of o a has some meaning in and of itself. For example, if 
schools A and B chose to administer the Metropolitan Reading Test at times 
1 and 2, respectively, and if a, = +4 and a t — —4 ( not an unusual occurrence 
since each time-period effect lies only one standard deviation from zero, the 
mean of all a/s), then the two schools would be expected to differ by 8 points 
on the test even if they were achieving at the same level. An obtained 
difference between them of 8 points might easily be due to the fact that they 
administered the test at different times of the day. Note that this would be 
highly improbable if a B were .25, for example; it would be highly unlikely 
that an 8 point difference between school means could be attributed to “time 
of day effects if o € were .25, An estimated variance component of 18.66 in 
our example indicates that for test scores to be comparable from school to 
school in the system, an effort must be made by the administration to see that 
the tests are administered at the same time of day in each school. 

While this absolute” interpretation of a\ is occasionally possible, more 
often a\ only takes on meaning when compared with o*. For example, we 
nnght find it more informative to know that the ratio of 6* to 6* is 1 or 3 or 
* .* . V 1 ou . r cxam p'e» dU a* is 0.55. In the parlance popular among 

statisticians, it is said that “the variance due to ‘time of day’ is about half as 
great as the variance among students’ test scores.” 
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One can identify at least three inferential questions in the one-way 
random-effects model : (I) How can a confidence interval on o\ be established 
d|? (2) How can a confidence interval on o-/cr 2 be established around 
6116 s ? (3) How can the hypothesis that a 2 = 0 be tested? 

Unfortunately, the techniques for setting a confidence interval around 
6* cannot be derived in a straightforward manner from the original 
model. Approximate techniques are available; however, they are quite 
complex (see Schefle, 1959, pp. 231-35). 

It is a relatively simple matter to establish a confidence interval on t^/cr 2 
around a*/<x 8 . The 1 — a confidence interval on a |/<r 2 is given in Eq. (18.1 1): 



< 


i rMs b / i 



(18.11) 


Both the 100(1 — (a/2)J and the I00(o/2) percentile points in />_ 
are required in Eq. (18.11). Recall that 

„ 1 _ 

«/aC *= r 

Suppose we wish to construct the 95% confidence interval on o 2 /o 2 for 
the data in Table 18. 1 . From Table E in Appendix A we first find the value 
of . m F 4>30 = 3.25. Next the value of , t 7 s F», A is found from which the other 
required percentile is calculated as follows: 




1 _ 
* 8.46 = 


Substituting the two percentiles along with the values of MS, MS U , and n 
into Eq. (18.1 1) yields the 95% confidence interval on erj/o 2 : 


/ 164.40 1 j\. 

1/164.40 1 

\ 33.81 3.25 /’ 

7\ 33.81 0.12 / 


The 95% confidence interval on o*/u 2 extends from .07 to 5.64. This 
interval tells how very uninformative our study has been. According to our 
data we have little reason not to believe that o\ could be anywhere from one- 
tenth as large as o s to about 5£ times larger than o *. A study that produces 
stable estimates of o s and a*, i.e., short confidence intervals on aj/o*, cannot 
be done cheaply; both J and n must be fairly large. 

Testing the null hypothesis ff 0 : tr* = 0 is not of as much interest as was 

the test of H 0 ‘. a; = 0 in the fixed model. We can often become sufficiently 

skeptical to entertain the possibility that a small set of means are equal— as 
in the fixed-effects model ANOVA— in which case we wish to test the fty- 
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pothcsis of no differences. Certainly far less frequently does it strike us as 
even remotely possible that an infinite number of levels of a factor all have 
the same population mean, as must be true if //„: cr 2 = 0 is true. Hence, 
when the random-effects ANOVA model is applied, interest will center on 
the estimation of o* more often than on testing whether cr 2 = 0. 

However, if compelling reasons exist for testing H t \ a* = 0, it can be 
tested readily as follows: IfF— AISJMS U exceeds the 100(1° - a ) percentile 
point of the F-distribution with J- 1 and J{„ - 1) degrees of freedom, 
a = 0 can be rejected at the a-lecel of significance. For example, with 
* UF r - '64.40/33.81 = 4.S6, which exceeds 

4.02, the 99th percent, Ic m F,„. Thus, H,-. a‘ - 0 can be rejected with 

giveifb' . a " Va ' UK ° f ' he samp,in * distribution of F = MSJMS. is 

r- MS b 


MS m 


(18.12) 


rando""ffLrANO P VAL d i V ! :10P ' d Kclion aboul ,he °"«-fi*tor 

eneClS ANOVA model are summanzed j n Tab]c 18 3 

TABLE 19.1 SUMMARY OF THE ONE-FACTOR RANDOM-EFFECTS ANOVA 
Source oj 

variation df cc Estimated 

— — — - . F(hfS) variances 

Between levels j~\ y ( ^* * ") 

r, ; — / ffJ , , MS, - MS. 

j_i n j n -r " o a 

Within levels J(„ - i) y (jj A ") 

rr, " A ' Z o’ <}« = Are 


the assump ^ of ^rtdo^ff^ANOVA ' C °. nS . eC l uences of bating 
saw that the consequences for thr ° r*. model - In Cha P ter 15 we 
violation of the normality assumption w * ° f tl '. c Itod-effects ANOVA oT 
geneous variances ate immaterial in <h fi ", ^’fibic; in addition, hetero- 
However, the situation hdfc * ****"«*" "Wequal. 

In particular, ,/ ,fc burtons of, I* ra ndom-effects ANOVA model. 

.fi-, Url0S 'i °( t!,e norf nal distribution »u~ Elflf. 7 ^ ~ t* deviales from 


the kurtosis of thefo^mflYftrZuTT^ = ^ ~ H deviates fro, 
a J° or of the test of //.• a t __ n ’ f V( *lidtty of confidence intervals on 
normality of the observations’ witl^L r T", " rhn «*»*"• Mw 
f c or is of Uttl e consequence. ° T ece ^ °f the random-effects 
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The third and final distinctly different analysis of variance model that will 
be dealt with is a coalition of the fixed-effects and the random-effects models. 
This union of the two models into a mixed-effects model is particularly 
useful in experimental research. 

As the name “mixed-effects model” might suggest, the mixed model 
involves two sets of effects: one fixed and the other random. Naturally, then, 
the model describes data gathered in a two-factor design, similar in appearance 
only to the two-factor fixed model of Chapter 17. One factor, e.g., the 
row factor, comprises a set of /fixed effects; the column factor is a random 
sample of /random effects from a supposedlyinfinite population of normally 
distributed effects. Consider a hypothetical experiment in which evaluators 
are comparing three elementary science curricula: (1) the American Associ- 
ation for the Advancement of Science curriculum (AAAS); (2) the Ele- 
mentary Science Study curriculum (ESS); (3) the Science Curriculum 
Improvement Study (SCIS) curriculum. The dependent variable of the 
experiment is “knowledge of the processes of scientific inquiry,” which is 
measured by a 75-item objective test. The experiment is designed as follows: 
10 elementary schools are chosen to participate; six experimental class- 
rooms are available within each school; by random methods, two classrooms 
are assigned to each curriculum within each school. Observations of the 
dependent variable are taken by administering the “processes of science” 
test to all 60 experimental classrooms and averaging the scores of the students 
within each class; thus, the 60 observations in the experiment a re classroom 
means. The data from the experiment may be laid out as in Table 18.4. 


TABLE 18.4 LAYOUT OF DATA FROM AN EXPERIMENT IN WHICH THREE ELEMENTARY 
SCIENCE CURRICULA ARE COMPARED (SCORES ARE CLASS MEANS TO THE 
NEAREST WHOLE NUMBER ON A 75-JTEM TEST) 


School 

0> 


Curriculum ESS 


34, 32 | 42. 47 


165 238 



603 

747 

683 

2033 


Row 

30.15 
37.35 
34 t$ 


The general observation in Table 18.4 is denoted by X tlt , where i ranges 
over rows (curricula) from I to 3,j ranges over columns (schools) from l to 
10, and k ranges over observations within cells (classrooms) from 1 to 2. In 
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general, * — 1, y — 1 and k = I, . . . , n. The notation is 
equivalent to that for the two-factor fixed-effects ANOVA of Chapter 17. 

The two-factor design in Table 1 8.4 presents two sets of main effects plus 
one set of interaction effects. The two main effects are “curricula,” which 
will be called factor A, and “schools,” factor B. Clearly the three science 
curricula were not sampled from a large population of such curricula, nor are 
the evaluators interested in generalizing to a hypothetical population of other 
curricula from which they could have conceivably been sampled. Interest 
focuses on the question of which one of these curricula is superior to the other 
wo. Hence, the three main effects of factor A are considered “fixed.” On 
5,“?" t, ' i ' SCh °°' S ,e P re!tmcd 'he experiment can be con- 
PMantlv nr, “7 fr ° n ’ “ POP- 1211 ™ ° r “hoels; or more im- 
to n 7 , S d0 no1 W3nt 'he results of their study to be limited 
suoeboritv C „r°,h ' ,h 7 x P eriment - Their conclusion abom the relative 

sampled 

effects model: "° , ' S m thls d '*'gn .s aptly named the mixed- 

. „ . + + + + (| 8il3) 

where AT,,, is the (tth observation in the i/th cell 

«, h the' m, and P 0P 7' i< ’” m,a " ° f a " observations, 

**» is the interaction effect („ -V * leve ' of . ,hc ra " dora factor, 

. ^ 3d O"O f .hc^eda„d^ d om'‘U:t? ° f ,h ‘ 

“ el observations with^'e ^1?"“ ,ha ' a,X ° UMS r ° r Varia,ion 

.he ar ' P,a “ d - - — or 

A TL._. ••• + «*,!* 0 for ally. 

t of a bff s for a single / (row) has a 


4 th. T . '••‘ r = 0 

mer„ P r“tr° r,hCi " fmi1 ' 


irS' i^r »» 
10 “ r ° : 
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obtain a particular row mean, for example, will not cause the J values of b, 
or the J values of a b„ to sum to zero. 

Suppose we wish to compare X lm and X t .., i.e., the means of the class- 
rooms under the AAAS and ESS curricula. These two means have the 
following structure in terms of the model in Eq. (18.13): 

%>- = 2 (M + + i< + «<>i, + e„,)j 

= ft + *1 + b + ahj + e U ' . 

= ft + «t + B t + ahj -f- __ . 

The difference between A*,.. and AY. is 

AY. — AY. = («j ~ a») + (oJjj, — ab 3 _) -f (e Jm , — e s J. 

Because the ab's do not sum to zero across the J columns and because 
a replication of the experiment with a different set of /random effects would 
produce different values of ab,. and a b 2 ., the sampling variance of the dif- 
ference between AY. — A**., will contain a component for the interaction 
effects, ab. This fact will be fully appreciated when we discuss the expected 
values of mean squares for the mixed model. But before entering upon that 
subject, we must state the assumptions which must be made about the mixed 
model of Eq. (18.13). 

The following assumptions are made about the terms of the model 

Xlit ~ f* + *1 + + e iik • 

1. The random effects, b, ~ p, — p, are normally distributed with a 
mean of zero and a variance of a\. 

2. The interaction effects ab,, are normally distributed over j for each / 
with a mean of zero and a variance of a\ b . 

3. The error components e (jk are distributed normallyand independently 
of the b's and ah’s with mean zero and variance a 2 . 

A fourth very important assumption is necessary to insure the validity 
of the hypothesis test of the fixed main effects: 

4. For all pairs of levels of the fixed factor, the correlation (across the 
population of random effects) of the scores under one level with the 
scores under the other level of the pair must be the same . For 
example, if in the population of all schools the correlation of class- 
room mean test scores for the AAAS and ESS curricula is .50, then 
the population correlation for AAAS and SCIS and for ESS and 
SCIS must be .50 as well. 
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We shall indicate later how the methods of this section should be modified 
when assumption 4, homogeneous correlations, is suspected to be seriously 
violated. 

With the model and its assumptions stated, we can proceed to develop 
the methods by which null hypotheses about the fixed and random main 
effects and the interaction effects may be tested. Our objective, then, is to 
test the following three pairs of hypotheses: 

1. //«,: t a* = 0 vs. //,: £ a* ^ 0. 

2. H 0 : er* = 0 vs. H t : a\ ^ 0. 

3. H t : <r*» = 0 vs. H x \ # 0. 

Hot surprisingly, perhaps, the road to the hypothesis tests leads through 
the familiar sums of squares, degrees of freedom, mean squares, and expected 
values of mean squares. In fact, the computations of SS t df> and MS in the 
mixed model are identical to the calculations in the two-factor fixed model. 
The two models do not part company until the expected values of the mean 
squares are reached. The computations in the two-ractor mixed model are 
presented in Table 18.5 and are illustrated on the data of Table 18.4 in Table 
18.6. 


TABLE tB.S COMPUTATIONAL FORMULAS FOR SUMS OF SQUARES, DECREES OF 

FREEDOM. AND MEAN SQUARES IN THE TWO-FACTOR MIXED MODEL 

Source of 

taoouan df S5 


Among columns J 

{random factor B) J — \ SSg — ^ 


Interaction of T J 

A and B V - W - 1) SS A , = j £ 


7 J * Un 

,_(£jU.j fi i ± j-.,.)* 

I t” 17n 

j??W 
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TABLE 18.6 ILLUSTRATION OF CALCULATION Of SUMS OF SQUARES AND MEAN 
SQUARES ON THE DATA IN TABLE 18 4 


A 1S A = 
2. SS s = 


20 

SS A S20.53 


7-1 2 

(2063* + 065)* + ■ . . + (164)* (2033)* 


228.85 

7 — 1 9 


(/ — 1)(7 — 1) 18 

4- 55. - 0,,' + (20- + . . . + (26)* - . . + (57)' , „ 

- 71,733.50 = 237.50 


lJ(rt - I) 30 


Once again il is by way of the expected values of mean squares that we 
can see how various ratios of mean squares bear on' the question of whether 
or not a null hypothesis is true. The expected values — long-run average 
values over replications of the experiment in Table 18.4 with different schools 
and classrooms each time, for example — of the mean squares in Table 18.5 
are given in Table 18.7. 


TABLE 18.7 


EXPECTED VALUES OF MEAN SQUARES IN THE TWO-FACTOR MIXED-EFFECTS 
MODEL 

Mean square E(MS) 


MS A (fixed factor) 

MS a (random factor) 
MSab (mixed interaction) 
MS U 


a * + no\ t 
o* + nla\ 



The expected mean squares in Table 18.7 present a far different picture 
from the £(MS)’s for the two-factor fixed-effects model. Particularly, the 
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variance of the interaction effects, o;„ that appears in EfttSj) looV is unusual 
at fust. The presence of <■;„ in WS A ) »as anticipated in our discussion 
of comparing X \ and JP,.., the difference bclweem whichcontamed e,„ - e,.. 
(front which we obtain o* in «A«a)l and «*,. - a*,, [from which we 
obtain in E(MSJ1. No comparable situation existed in the IV o-factor 
fixed model; there it was seen that summing across any one row would add 
out” the J interaction effects since 


Consider first the problem of testing the null hypothesis that all the a, s 
are zero, i.e., 

H 0 : iy, = 0. 

1-1 

You must repress what is perhaps your first inclination to divide AfS A by 
MS W and refer the ratio to the F-distribution. Notice that MSjJMS* does 
not bear directly on the question of whether 
i 

is zero or not. The quantity 2 could be zero and yet MS A fMS u might be 
large because is not zero. The difference between MS A and MS m 
estimates nal b + nJ^ eef/(/ — I) instead of just — 1), as it did 

in the two-factor fixed model. It can be seen by inspecting the above expected 
mean squares that £(A/S*) differs from E(MS Aa ) only in that term, 2 **» 
which is being tested. Thus, the size of the discrepancy between A fS A and 
AfSjf,, or the ratio oF MS A to MS AD , bears on the size of 2 More 
specifically, given ihe assumptions of the mixed-effects model (particularly 
assumption 4), F = MS A JMS AB will have the F-distribution with degrees of 
freedom I — 1 and (f — 1)(J — 1) if 


Ho' 2 ** — 0 

is true. A positive value of 2 will tend to inflate JlfS A above MS A „ and 
give values of f = MS A IMS AB that are larger than the typical values in the 
distribution. 

Though the null hypothesis H 9 : a* = 0 is of less interest than the 
hypothMis about the fixed main effects, it can be tested by referring the ratio 
j ~ ms bJMS m to the table of the F-distribution with J ~ 1 and IJ(a — I) 
degrees of freedom. Moreover, a* can be estimated by (MS n — MSJftnf), 
J' 1 ' 1 - * «c5<icncc inlerval on uj/u< c^beconsTricled 
5. ”v 6 ^ ( , 8 '- ) ° f ^ l «; A«o. II, and are substiluted 

for MS., u, and i„ that t ,ualiou. 

Tbe null hypothesis may be tested by referring F = 
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MSj. afAfS m to the F-distribution with (/— !)(/ — 1) and U{n — 1) degrees 
of freedom. 

For the data in Table 18.4, the three null hypotheses mentioned above 
have been tested at the .05 level of significance. The results appear in Table 
18.8. 


TABLE 18,8 NULL HYPOTHESIS TESTS FOR THE DATA IN TABLE 18.4 

Critical F 

Null hypothesis F-ratio (a = .05) Decision 


*-*z« 


MSj 
= MS„ 


H 0 '- o* - 0 


A fS B _ 228.85 
MSl ~ 7.92 


28.90 


( F t „ = 3.55 Reject H„ 

F»,,o « 2.21 Reject H a 


H 0 : <x*. «= 0 


MS ab _ 14.92 
~ MS* = 7. 


. n F,t.u = 1.95 


Do not 
reject H a 


We see in Table 18.8 that the main effects for both the fixed and random 
factors are quite statistically significant. The test for interaction is significant 
with an a of .10 but not at the .05 level. It would now be legitimate to apply 
the Tukey method of multiple comparisons, which was discussed in Chapter 
16, to factor A (“science curricula”) to determine which pairs of means differ 
significantly. In place of MS„ with J(n - 1) degrees of freedom that are 
used in the Tukey method in the one-factor ANOVA, one uses A fS AB with 
(/ _ i)(/ — 1) degrees of freedom (Schefle, 1959, p. 270). 

The two-factor mixed-effects ANOVA model with n = 1, i.e., one 
observation per cell, is frequently encountered. For example, six persons 
(random factor B ) may each be observed under four treatment conditions 
(fixed factor A), as shown in Fig. 18.2. This design is commonly referred to as 
a reqeated measures design, because observations of persons are made several 
times instead of once. In general, an observation in the repeated measures 
design is denoted by X ti , i indicating the row in which an observation lies 
and y indicating its column. The mean squares between rows (the fixed 

Persons 

t/) 



1 

z 

3 

4 

5 

6 

1 

•*M 

X l2 


•*14 


•*J6 

Treatments 2 

Xg, 

Xgg 

•*23 

•*24 

■*25 

'*26 

W 3 

•*31 

•*32 

•*33 

•*34 

•*35 

•*36 

4 


X&g 

•*« 

*44 

•*45 

■*46 


FIG. 18.2 
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factor), between columns (the random factor), and for the interaction of tows 
and columns can all be calculated by means of the computational formulas m 
Table 18.5 by letting n = 1 and dropping out the index k and all summations 
over it. Since n = 1 , there are IJ(n - 1) - 0 degrees of freedom for variation 
within cells; thus, the variance or scores around the mean of the i/th cell 
cannot be estimated. The analysis of variance table Tor the sources of 
variation that do exist appears as Table 18.9. 


TABLE 18.9 ANOVA TABLE FOR THE TWO-WAY MIXED-EFFECTS ANOVA WITH n - I 


Source of carlation 

df 

E(MS) 

Factor A (fixed factor) 

/- I 

J 

<7* + oj, -f — 

Factor B (random factor) 

J - 1 

o + «; 

Interaction of A and B 

(/ - IK/ - 1) 

o x + cj. 


If v.® assume that the / levels of the fixed factor would all correlate 
equally in the population of persons (the random factor), then F=* 
MSjfMSjm will have an F-distribution with /— 1 and (/— 1)(/ — I) 
degrees of freedom when 2 « 0 is true. Hence, F ** MS A fMS A 0 

can be compared with \^,F to test at the a-levtl of significance. 

No tests of the null hypotheses //„: a\ and ff 9 : 0 are possible in the 

mixed-effects model when n = 1. 


An important assumption for the test of //„: 2 — 0 to be valid in the 

mixed model is that the correlations of all pairs or levels or the fixed factor 
across the population of random factor levels must be the same. Violations 
of this assumption work to increase the actual probability of a type I error 
above ihe value believed to hold, the “nominal" value. When heterogeneous 
correlations among the pairs of levels of the fixed factor are suspected, 
S £ CI !l m - Ca T CS m L us ‘ ^ taken to * nsUr c the validity of the F- test of the null 
hypolhes's atom the ft«d main effect Bm <)954» showed the t the effect 
.I heteo ^sco^lattons of the Used factor levels was to produce a 
Zff E lhal hls »' freedom /„, 

<1959).ho' Sth M ~ * h ,ms - Greenhouse and Ceisser 

1 2 J - 1 for th’A .T ,bcd ' p " s be reduced to 

‘. a ,:;. r :' ° r distribution offf„ USJUS„ under a true 

Una 1963.) These findings suggested 


1. If F— exceeds the 1 — a 


percentile point in the F 



sec. 18.4 


RULES OF THUMB FOR WRITING THE ANOVA TABLE 471 



To test H 0 : /*, = * • • =/*/'• 


j IS F > F( / _, )>u ^ 1)(i/ _ 1 ,7 j Yes | Is F > 

live test) 

[ (apparent test) J 

(conserve 

|no 

| Wo tenable j j 

No 

j Yes 

Perform Hotellings T 2 test 
(see V/.ner,l962, pp 632-35). 

| Reject H 0 j 


FIG. 18.3 Schematic diagrams of the analysis of the repeated measures 
design. 

distribution with 1 and J — I degrees of freedom, reject H 0 : 
2 a* = 0 at the a-level of significance. (The conservative test.) 

2. If F = MSjjMS AJ} falls below the 1 — a percentile point in the •re- 
distribution with / — 1 and (/ — 1)(7 — 1) degrees of freedom, do 
not reject ff 0 : ^ a? = 0 at the a-level of significance. (The apparent 
test.) 

3. If F = MSJMS ab falls between and x-«^/-i.(/-»( 1 r-i>» 

one must resort to a multivariate technique known as HoleHing’s T* 
test (see Scheffe, 1959, or Winer, 1962). 

The strategy for analyzing a repeated measures design is depicted in 
Fig. 18.3. 


18.4 

RULES OF THUMB FOR 
WRITING THE ANOVA TABLE 

The following sections of this chapter consist of a collection of “rules of 
thumb” for finding all entries in an analysis of variance (ANOVA) table for a 
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Foctor C 


Factor r 
Treat merits 



FIG 18.4 Data layout of the example used to illustrate Ihe application of 
the ANOVA rules of thumb. 


large class of ANOVA models.* For designs in which each pair of factors is 
completely crossed or nested, in which each factor is random or fixed, and 
in which the same number of replications (observations) are taken within 
the smallest subdivision of the design (cell), rules are provided that specify 
the possible sources of variation, the associated degrees of freedom, com- 
putational formulas for sums of squares, and expectations of mean squares. 
If you master these simple rules you will hopefully come to regard complex 
analyses of variance as less inconvenient and be encouraged to attempt them 
•when they are appropriate. 

Many Sources exist that provide rules of thumb for finding some of the 
entries in an ANOVA table from some or the designs considered here (e.g., 
see Bennett and Franklin, 1954; Cornfield andTukty , \95b; Guenther, 1964; 
Henderson, 1959; Scheff€, 1959; Schultz, 1955; Winer, 1962). None of them, 
howeveT, duplicates the coverage of this section, although the somewhat 
inaccessible Henderson (1959) paper is the most similar to the presentation 
to follow. 

The following experiment involving six classrooms will be used to illus- 
trate the application of the rules presented. Three of six classrooms included 
in an experiment are from public schools (Pj) and three from private schools 
(P s ). Each classroom was administered five treatments. Suppose that each 
class was split into 10 subgroups at random and that two subgroups responded 
independently under each or the five treatments. Thus, two observations, X, 
and X t , on the single dependent variable X exist for each classroom. Thi; 
design is illustrated in Fig. 18.4. 


These sections ate revised from Mfllnan and Glass (1967). 



18.5 

DEFINITIONS OF TERMS 

f-A. Crossed and Nested Factors 

Two factors are crossed if every level (the different categories of a factor are 
called levels) of one of the factors appears with every level of the other factor. 
That is, there must be at least one observation for every possible combination 
of levels of factors that are completely crossed. Thus “type of school’’ and 
“treatments” (having two and five levels, respectively) are crossed since there 
are observations taken at each of the 10 school-treatment combinations. 

A factor is said to be nested in a second factor if each level of the first 
(the nested factor) appears in exactly one level of the second factor. In our 
experiment, “ classroom ” is nested. No classroom appears in both a public 
(P,) and a private (P t ) school. If, however, the same three classrooms were 
involved in both P x and P 2 , then C would not be nested in P. Nesting exists 
when one level of a variable does not appear with all levels of another 
variable. Since C t under P x is not the same class as C 4 under P 2 , C x does not 
appear in both levels of P. Note that C x is combined with all levels of T 
(“treatments”); the factor “classroom” is not nested in T. For the purposes 
of this discussion, any nested factor must have the same number of levels in 
each level of the factor in which it is nested. In our illustration, there are three 
levels of “classroom” nested under both levels of “type of school.” 

1I*B. Random and Fixed Factors 

A factor may be considered random if the levels of that factor used in the 
study are a simple random sample from a population of levels with normally 
distributed effects. “Students” and “classrooms” are two factors frequently 
considered random. Results of an ANOVA may be generalized to the 
population of levels of a random factor. When all levels of a factor are in 
the study (e.g., male-femare or high-average-low), when only the levels of 
interest to the investigator are in the study (e.g., method A and method B 
where the other methods are not of interest), or when a systematic selection 
of levels is used, the factor is considered fixed. Results of an ANOVA may 
be generalized only to the population of replications of the experiment in 
which the specific levels of the fixed factor included in the study are present. 
For example, a study that systematically selected grades three, six, nine, and 
12 can generalize its results only to those four grades. 

Actually, the status (fixed or random) of a factor depends as much upon 
the population of replications of a study to which one wishes to generalize as 
it does upon the way in which the levels were chosen. Five levels of a factor 
could be randomly sampled from a virtually infinite population of levels, but 
the factor would be “fixed” if one made inferences to replications of the 
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study in which only those five levels chosen appear. This is an abstruse 
point about which we will have no more to say. 

In our example, “classrooms" is considered a random factor, and "type 
of school" and “treatments" arc fixed. We shall also call “replication” 
Wlthrn the smallest cell of a design a nested factor that is always random and is 
nested wtthm all the other factors of the design. This pain I is important ani 
allows us to specify rules of thumb that are simpler than many others proposed. 


18.6 

DETERMINING THE POSSIBLE 
LINES (SOURCES OF 
VARIATION) OF THE 
ANOVA TABLE 


ll-A. Notation 


letter rollowcd bya colonand thin' t/T”' is dcn °'=d by a capital 
factors within which it is nested' e t 'Tor" °J I " ,trs dE "° tin S lhE 
factor B. '■*•> A ■ B factor A nested within 

3 ‘ is denoted by a corn- 

colon and the letter or letter! of rh? r ?'"* ^ c ! ors fo| l° w cd by a 
action is nested, e.g., AB or AB-C ^ aCt0re Wlt h' n which the inter- 


1I*B. Rules 


nested (this includes the them™ J”,' '“ h factor both crossed ar 
2- The ANOVA table h. , "P 1 ™™*"). 

[“° r -.«c.) interacti„„r'am!' f0 f a a " P ° ssibl = <«™ factor, thr, 
mterachons can exist, all S. “■'° rS ' . To d «™ine whir 
‘be following rales fir R a ‘ rs - . trlos . etc, of factors a! 

undem ( lf no colon appears To ,u °" s >.» the factors bein 

b - write for a rac, ° r ' '• 
" h no repetition of a leliei 
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those letters to the right of the colons in the factors being 
combined. 

c. Delete any combination having a letter to the left of the colon 
that is repeated to the right of the colon. 

II- C. Illustration of the Rules 
Under ll-B (for the example 
in Sec. 18.3) 

1. P (types of school), T (treatments), C\P (classrooms nested within 
P), and R :PCT (replications nested within P, C, T combinations) 
represent the crossed and nested factors and are lines in the table 
by rule II-B.l. 

2. The possible interactions among the factors above include: 

PT which is retained and may be written TP. 

PC'.P which is deleted because P appears both before and after the 
colon (rule II-B.2c). 

TC\P which is retained and may be written CT:P. 

PTC'.P which is also deleted because P appears both before and 
after the colon. 

All the interactions involving R:PCT are deleted because of rule 
II-B.2c. 

3. Thus, the lines of the ANOVA table in our example consist of P, 
C:P,T, PT, CT:P, and R-.PCT. 

18.7 

DETERMINING THE DEGREES 
OF FREEDOM FOR SOURCES 
OF VARIATION 

III- A. Notation 

1. The number of levels of a factor not nested within any other factor 
' j s denoted by the lower case of the letter identifying the factor. In 

our example, p — 2 and t = 5. 

2. The number of levels of a nested factor within each level or com- 
bination of levels of the factors in which it is nested is denoted by the 
lower case of the letter to the left of the colon identifying the nested 
factor. For example, in the nested classification C:P, the number 
of levels of C in each level of P is denoted by c, which is 3 in our 
example. In the nested classification R:PCT, r denotes the number 
of replications within a cell, which equals 2 in our example. 

3 The total number of observations N equals the product of all the 
’ lower case letters for the crossed and nested classifications, in our 
example, this number ispxcx/Xr = 2x3x5x2-ou. 
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Ill-B, Rules 

1. Let a lower case letter correspond to each capital letter. The 
degrees of freedom for any line in the ANOVA table are found by 
subtracting one from each lower case letter to the left of the colon 
and multiplying the grand product of these differences by the grand 
product of the lower case letters to the right of the colon. 

2. As a check, the computed degrees of freedom should add to N — 1 . 

Ul-C. Illustration of the Rules 
Under Ill-B (for the example 
in Sec 18.3) 

1. The degrees of freedom for P — p — 1 = (2 — 1) = 1. 

The degrees of freedom for C:P = (c - \)p = (3 — 1)2 = 4. 

The degrees of freedom for 7~ = (f — I) = (5 1) = 4. 

The degrees of freedom for 

PT ={p- l)(r — 1) = (2 — 1)(5 - 1) *= 4. 

The degrees of freedom for 

CT:P = (c - l)(r - l ) p = (3 - 1)(5 - 1)2 = J6. 

The degrees of freedom for 

R:PCT= (r - 1 )pct = (2 - 1)(2)(3)(5) = 30. 

2. As a check, 1 +44-4 + 4 + 16 + 30 = N —\ =59. 

18.8 

COMPUTING SUMS OF 
SQUARES 


IV-A. Notation 


dc r" an <*■«**>» ->■ 

case letters used in express!™* ^. as subsc npts all the different lower 

subscripts!' FOT'cjlImpfc”! J 1 " 1 * ‘° d “‘ ,, '. th ' u PP 5r ° r 
example. V3riab ' C ' 1 • «P * * Z&ZSlSSlXi £ 


>• each bee in the ANOVA tabic mitc down dcgms , 



sec. 18.8 


COMPUTING SUMS OF SQUARES 477 


freedom ( df) in their symbolic form (i.e., using lower case letters) 
and expand algebraically. For example, the line FT has degrees of 
freedom (p — l)(t — 1) which equals -\-pt — p — t + 1. The 
computational formula for the sum of squares of a source of variation 
will consist of as many terms as there are terms in the expanded 
symbolic expression for the degrees of freedom (four terms in the 
case of the PT interaction), and these terms will have the same 
algebraic signs as their corresponding terms in the symbolic ex- 
pression for the !#"[+, — , — , and + in the case of the expansion of 
(p — l)(t — I) shown above]. 

2. For each term in the expanded algebraic representation for the df 
write a multiple summation corresponding to each subscript of the 
general observation. Precede the summation with the algebraic sign 
of the term to which it corresponds. For example, for the term 
+pt in the algebraic expansion of (p — l)(t — 1) one would write 

iiii 

3. For each multiple summation expression, place within parentheses 
X and those summation signs whose upper limits do not appear in the 
corresponding term in the expanded expression for the df. For 
example, for the term +pl one woufd place the parentheses as 
follows: 

+| i(ii 

4. Square the expression inside the parentheses and divide by the total 
number of observations summed over to get the quantity inside 
the parentheses. This number will be the product of the upper 
limits of the summation signs inside the parentheses. If no sum- 
mation sign appears inside the parentheses, one “sums” over 1 
value, so the term is divided by 1. The part of the computational 
formula for the FT interaction that corresponds to the +pl term is 
then 

I*-'-')’ 

cr 


1V-C. Illustration of the Rules 
Under IV-B (for the example in 
Sec. 18.3) 

In Sec. 18.7 above, six sources of variation were identified for the example 
problem. Thus, six sums of squares must be calculated in the analysis of the 
design. Only the formulas for the sums of squares PT and R:PCT will be 
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demonstrated- Attempt to write the remaining formulas by following rules 
IV-B.l through 1V-B.4. Four subscripts are needed to denote a general 
observation: X vctr . 


1. Sum of squares for PT. 

Rule 1. df for {PT) — (p — !)(/ — 1) = pi — p — t + I. 

Rule 2. pt — p — t + 1 : 

liii Hit Hit. r,.„ 

1 i i i 1111 iiii 

+ 2222 X pctr 

Rule 3. pt — p — / -FI; iiii 

i£(if - ±(± ± s a',—) 


Rule 4. pt — p — t + l : 


■ (iiii 


t(i | i (iiii *,.,.) 1 


for the sum'of sonar °." ° f 4 ab ° Ve is lhc com P»'ational formula 
tor the sum of squares for the interaction of factors P and T. 

2. Sum of squares for R:PCT 

Ml **-**-*■ 

Rule 3. petr ~ pet-. 

iiif(*,„j - 1||(2 x a 

Rule 4. petr — p C t: * ' 
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The preceding formula yields the sum of squares for replications 
(or “within”). 


18.9 

DETERMINING THE 
EXPECTATIONS OF MEAN 
SQUARES 


V-A. Notation 


1. The symbol a 2 , having to the left of the colon in its subscript only 
lower case letters corresponding to random or finite factors, denotes 
the variance of a random variable underlying those random and 
finite factors. For example, o\ p denotes the variance of the effects 
associated with all the classrooms (C) included in the population of 
classrooms found in a particular type of school ( P ). 

2. The symbol a 2 , having included to the left of the colon in its sub- 
script lower case letters corresponding to fixed factors, denotes a 
function of the sum of the squared effects of the variables represented 
to the left of the colon.* For example, a\ denotes a function of 
squared fixed effects associated with treatments, e.g., oj — 2 “*/ 

f/~0- 


V-B. Rules 


1. Unless deleted by rule V-B.2 below, the expectation of a mean 
square of any factor contains a o 2 for each line in the ANOVA 
table that has in its denotation all the letters denoting the mean 
square under consideration. 

2. Certain of the a 2 components of an expectation for a mean square 
given in V-B.l above vanish according to the following rule: any 

having to the left of the colon a letter denoting a fixed classi- 
fication disappears except when the source of variation of the mean 
square includes this letter. Remember, if there is no colon in the 
subscript, the colon is by definition at the right of all letters. 

3. The coefficient of a particular a 2 in a particular mean square 
includes the product of all the lower case letters not found in the 
subscript of a 2 . 

• Although it has become customary to adopt the o’ notation, the reader should keep 
in mind that for factors or combinations of factors involving fixed factors, the a * is not 
a variance of a random variable; it is related to a sum of squared constants (the fixed 
effects). 
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Y-C. Illustrations of the Rules 
Under V-B (for the example 
in Sec. 18.3) 

One way of determining the expected values of mean squares is first to list all 
possible a*, then to eliminate selected <7* by rule V-B.l, then eliminate more 
o* by rule V-B.2, and finally to attach coefficients to the remaining com- 
ponents by rule V-B.3. This procedure will be followed here. 


1. The expected mean square for any line could contain o* B , o’ t , a\ p 
ai t , and a% p as well as a; M * In the E(MS) for P, delete a } 
because a ) does not have a p among its subscripts (rule V-B.l). 

f° r C-P ’ eliminate oj, oj, and because none 
of these has both a c and a p among its subscripts (rule V-B.l). 

In the E{MS) for T, eliminate a\ and o* (rule V-B.l). 

S' E S*. ,S> f ° r PT - o?.. and tr; (rule V-B.l). 

In the E(AfS) for CT:P eliminate all a'- except and & 1 since 
none contains all the letters c, /, andy> in its subscripts (rule V-B.l). 

eo.t!". , f0r R ' CTP ■ eliminalc a " <•' "cept o', since none 

contains all the letters c, t, p, and r (rule V-B.l). 

2 ’ JSS 1 that C , a " d R are considered random, P and T fixed. In 
addition to a\ the £(A/S) for P so far contains <rj, <r* a 2 ., and 

Now contams fixcd factor P to the left of the colon, but 

T'e, lmd " cons 1 ,d ' ,a " t "’ W ("> "'h leer, so 

ys ‘ contains only random factor C to the left of the 
eo on, so o,, also slays. «5,conlaiiis/«<i factor no the left oflhe 

stsr s<,oare md " <£ 

(to , j 1 i ,h ' rca50n o5, is eliminated o' 

the left ortte cS. 5 ^ ,h ' factor * is to 

f ° r C - :P a ° far only 

retained since c is part of C:P lSw' ° ls ! randon ’.. and " is also 
of the colon and Tis both fixed n d , C . Vcr ’ a ' t:p contains t to the left 
under consideration (C-p) Thus a 1 * . not 1 P a . rt of . the mean square 
, In addition to a\ the £f If £ 5 V! ell , m . lnated - 
a +r o' is retained because t is 2rt tf f ? r and 

Pnfixed artdp js not part n f T d- ..., ' ” K ellm| nated because 
< , survives because C is ranH™ S " T”' und " consideration, 
colon, and thus not affected by nile^-B .? 3 ' P “ ‘° ,he of ,he 

r,c '" **-» 
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Similarly, it can be shown that the E(MS) for PT is a 1 , o 2 and 
°*. P l and the E(MS) for CT:P is o 2 and o% :p . 

3. The coefficients of the surviving components are found by a straight- 
forward application of rule V-B.3. For example, the coefficient of 
of p is (t X r) = (5)(2) = 10. See Table 18.10 for the final results. 

TABLE 18.10 SUMMARY ANOVA TABLE (FOR THE EXAMPLE JN SEC. 18.3) 

Source of variation df E(MS) 


p 

(P~ 

1) = 1 

u* + 10 a\. t 

+ 30o£ 

c.p 

Pic- 

1) = 4 

+ 10 ol. 




(/-!)= 4 


+ 2a*,., + 12aJ 

PT 

(P “ 

W~l)= 4 

0* 

+ 2 <7*, 9 -f- 6a\, 

CT'.P 

Pic - 

!)(/ - 1) = 16 


+ 2 oj,.. 

R'.PCT 

pct(r - 

I) =*30 

tr s 


Total 

petr — 1 

- N - 1 - 59 




18.10 

SIGNIFICANCE TESTING 

It is usually the researcher’s purpose to test one or more null hypotheses, 
i.e., that particular variance components, fixed effects, or mixed effects are 
zero. To test any one such hypothesis, one first identifies the source of 
variation corresponding to the variance component, fixed effect, or mixed 
effect in question. The mean square for this source of variation becomes the 
numerator of the F-ratio, the test statistic used in testing the null hypothesis. 
One then determines what the expected value of this mean square is when the 
variance component , fixed effect, or mixed effect in question is zero. Caff this 
new expected value, E(MS \ H 0 true). The denominator of the F-ratio is the 
mean square that has the expected value E{MS \ H 0 true). In other words, 
the null hypothesis is tested with the ratio of two mean squares such that 
both would have the same expected value if the nufl hypothesis in question 
were true. (The mean square in the numerator corresponds to the source of 
variation being tested.) 

Suppose in our example we were interested in the hypothesis that 
o\ = 0. Since the expected value of the mean square for P contains a* and 
since the expected value of P equals the expected value of C:P when a 2 = 0, 
the ratio of the mean square for P to the mean square for C:P is appropriate 
for testing the hypothesis that a\ = 0. 

If no appropriate F-ratio exists for testing an effect, approximate 
methods are sometimes applicable. (See, e.g., Winer, 1962, p p. 199-202.) 

In spite of the fact that the above significance testing procedures arc 
common practice, a somewhat more enlightened skepticism concerning the 



482 ONE-FACTOR AND MULTI-FACTOR ANALYSIS OF VARIANCE 


CHAP. 18 


distribution properties of /--ratios in mixed designs (i.e., having both fixed 
and random factors) would seem to be called for by the considerations raised 
m Sec. 18.2 concerning mixed model assumptions and in SchcfT/ (1959, 
chap. 8) for example. 


PROBLEMS AND EXERCISES 


* sam P ,c 10 judges was drawn from a population of judges. 

Site ffp ' “ “! fc I* , “ | eut random sample of a _ 20 children on a 7 point 
results: " dj “ S latent.- The analysis of variance yielded the following 

Source of carimee df MS EWS ) 


Between judges 
Within judges 


9 

190 


10.48 

9.64 


«* + 20ct* 


b. a!ZV!l‘ ?! i“ d£ "- f '°»' (he above data. 

the same judged judges on the Wh ° din " S m ° rC ’ chi,drcn **'"2 ra,cd ^ 

a- HsmblJthe^Sr&aX’ *" ^ 

2 ' i “ A measurement searcher 

could be formed r,om the pool oHo'toT" 1 h "? 5W " m Sp '" inS ,eS ' S ,hal 
50-item tests is equal to tlJcnmhm .•° 00 The total number of possible 

03™. which i, a stagf.er.nFlv lar? ° nS L 30,000 ittms ,aken 50 a ‘ a li™. « 
•tern spelling tests by randomly . Umbcr - The researcher constructs six 50- 
tests can be item pool. Thus, the 

possible 50-item tests. Each test ■< , -° m ^ 5am P ,c d from the population of all 
" ** 12 pupils. The following test vrf IVe / > . *° indc Pendent random sample of 
50) are obtained : 6 SCOre * (total "umber of correct spellings out of 



a - Using the one-factor 


random-effects ANQVA 


calculate the estimate of the 
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variance among all possible spelling means, a\, and the estimate of the 
variance of pupils’ scores within any spelling test, 6 s . That is, calculate 


MS a - MS W 
12 


6’- = MS„. 


b. Calculate the 95% confidence interval around 6*16* on <r 2 /<j 2 . 

3. In a two-factor random-effects ANOVA with I levels of factor A, J levels of 
factor B, and n observations per cell, the expected values of the mean squares are 
as follows: 

E(MS A ) *=* a 2 + naj b + nJa\, 

E(MSq) = o 2 + no* b + /j/o®, 

EiMS^) = o* + ne* 6 . 

E(MS W ) - a 2 . 

Find a linear combination of the mean squares that provides an unbiased 
estimator of a\. [Hint: Notice how o| can be estimated: 

„MSb - MS ab E(MS d ) - E(MS AB ) 

E nl nl 

o a + no* b + nIo\ — o 1 — no* b nIo\ % f 

“ ITi 

4. It has often been maintained that neurologically handicapped children evidence 
a lower Performance IQ than Verbal IQ on the Wechsler Intelligence Scale for 
Children (WISQ. Hopkins (1964) compared the Verbal and Performance IQ’s 
for a group of about 30 children ranging in age from six years to 12 years who 
were diagnosed as neurologically handicapped independently of the intelligence 
testing. The following data were obtained: 


Person 

Verbal 

IQ 

Performance 

IQ 

Person 

Verbal 

IQ 

Performance 

IQ 

1 

87 

83 

16 

83 

85 

2 

80 

89 

17 

83 

77 

3 

95 

100 

18 

92 

84 

4 

116 

117 

19 

95 

85 

5 

77 

86 

20 

100 

95 

6 

81 

97 

21 

85 

99 

7 

106 

114 

22 

89 

90 

8 

97 

90 

23 

86 

93 

9 

103 

89 

24 

86 

100 

10 

109 

80 

25 

103 

94 

11 

79 

106 

26 

80 

100 

12 

103 

96 

27 

99 

107 

13 

126 

121 

28 

101 

82 

14 

101 

93 

29 

72 

106 

15 

113 

82 

30 

96 

108 
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Consider the ••Persons" .0 be a random Taelor and the "Verbal I TO 
aoee IQ" lo be a fued factor. Using the irmed-enects ANOVA model. «»** 
null hypothesis at the .05 level that populations of Verba l and re'forrrn^l lQ 
have the same mean for neurologically hand, capped chtldren. ( Thef-tcs t 
lead to the same decision that would be arrived at if n dependent^toups Me 
of the hypothesis i‘ t - /«i were made.) 

5 Scores on the Information. Vocabulary. Digit Span, and Block Design sublesls 
' olthe Wechsler Intelligence Seale for Children are labulafed below for a group ot 
12 neurologically handicapped children (Hopkins, 1964): 


Test 


Person 

Infer. 

Voc. 

Dig. S. 

DI. Da. 

1 

7 

8 

7 

7 

2 

5 

10 

8 

12 

3 

9 

It 

9 

II 

4 

17 

18 

9 

13 

5 

4 

7 

7 

9 

6 

6 

9 

8 

11 

7 

11 

11 

7 

7 

8 

10 

14 

12 

7 

9 

8 

11 

7 

13 

10 

12 

11 

5 

9 

11 

13 

16 

6 

18 

12 

11 

10 

11 

5 


W ISC subtest scores are scaled to a mean of 10 and standard deviation of 3 for 
the general population. U has often been asserted that patterns of subtest scores 
on the WISC can be used to diagnose neurological handicaps. Test the null 
hypothesis that the twelve test scores above were randomly sampled from four 
normal distributions with the same mean. 

The design should be regarded as a repeated measures design; hence, in 
performing the F-test use the flow chart in Fig. 18.3. 


6. Twenty raters are drawn at random from a population of 10,000 raters. Thirty- 
two ratees are drawn at random from a population of 20,000 ratees. Eight traits 
are “drawn at random" from a population of eight traits (i.e.,a// traits of interest 
to the investigator for this particular study are used). Each of the 20 raters rated 
each of the 32 ratees once on each of the eight traits in a well-conducted study. 

a. How many ratings does this yield? 

b. Which factors are "random” and which “fixed"? Are any factors “nested" 
within any other factors? Which? 

c. How many sources of variation are there in these ratings? List them. 

d. Using the rules of thumb given in the chapter, work out all the expected mean 
squares for this design, indicate the appropriate ratio of mean squares and 
the degrees of freedom for each F, and provide the formulas for estimating 
all estimable components of variance. 

(19616)' afl0,her Way 10 ana! - vze ten from this particular design, see Stanley 
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7. Two raters who were Democrats and two raters who were Republicans each 
rated three ratces who were prominent Democrats and three rafees who were 
prominent Republicans on each of four different traits. This yielded a total of 
2 x 2 x 2 x 3 x 4 = 96 ratings. The design is shown below. There are five 
factors, two of which are “nested.” 




Trait rated, Party of rate r. Rater number 

Party 

Ratee 

Intelligence 

Honesty 

[ Friendliness 

[ Generosity 

Km 


mm 

mm 

O 

in 

mm 

in 

n 

n 

U 


n 

□ 

B 

IB 

D 

□ 

IB 

IQ 

n 

IB 

B 

a 

a 

B 

B 

Q 


1 

a 

m 

□ 


0 

E 

0 

0 

o 

a 

El 

b 

0 

0 

m 

B 

R 

2 

a 

i3 

□ 

D 

a 

Q 

a 

0 

a 

B 

B 

n 

□ 

0 

n 

B 


n 

a 

El 

a 

n 

d 

0 

n 

0 

El 

B 

m 

0 

0 

0, 

IS 

0 


n 

EI 

a 

□ 

oi 

m 

0 

n 

a 

0 

a 

m 

0 

0 

01 

a 

a 

0 

ni 

EI 

□ 

El 

t» 

m. 

0i 

n 

□i 


a 

□i 

ni 

si 

m 

a 

a 


K3HI 

El 

a 

a 

ni 

B 

□ 

n 

Ell 

a 

bi 

n 

□i 

HI 

ai 

0 

0 


a. Identify the two nested factors. Within what is each such factor nested ? 

b. Do the nested factors "cross” any other factors ? Which ones ? 

c. Which factors does the "political party of rater” factor cross? 

d. Which of the factors were probably considered as each having had its levels 
drawn randomly from an infinite (hypothetical) population of levels ? 

conlribuKd mosl to variation of the ratings. 
Would you have expected this result in advance of the rating procedure? 
What does this interaction probably mean? 
f. The second largest estimated component of variance was that for party of 
rater x party of ratee X trait. What does this three-factor interaction mean ? 
(One sometimes secs a three-factor interaction referred to as a “second-order” 
interaction, because a zero-order “interaction” would be a main effect, not 
interacting with anything. Thus a two-factor interaction such as that in (e), 
above, may be called a "first-order” interaction.) 

For further results, see Stanley (1961a). 
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FUNDAMENTALS 

OF 

EXPERIMENTAL DESIGN* 


\9.l 

INTRODUCTION 

The word “experimentation * has come to have many meanings for behavioral 
scientists. The most common meaning might be termed "cxpenVntation,* 
trying new approaches and subjectively evaluating their effectiveness. In 
this chapter we are concerned with a more structured inquiry, akin to that 
earned out in many of the sciences. It involves control by tbe experimenter 
of at least one variable, such as method of teaching arithmetic, that he can 
manipulate. Thus, experimentation of this kind differs from observation of 
naturally occurring events in that the stage has been set by the experimenter 
so licit the possibly differential effects of at least two “treatments" can be 
observed in a situation where assignment of experimental units (often, pupils 
or classes) to the several treatments has been made without bias. “Nature" 
almost always makes biased assignments to its treatments; even before a 
natural experiment, the experimental units to be subjected to one treatment 
are usually not comparable to those to be subjected to another treatment. 

' For a Simple. general approach to this topic see Stanley (19676). 

484 
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For example, as a group, smokers differ in many ways from nonsmokers, any 
one of which ways might be the cause of a greater incidence of lung cancer 
among smokers. 

In brief, nature rarely assigns experimental units to treatments randomly, 
whereas the careful experimenter almost always does. One can define a 
controlled, variable-manipulating, comparative experiment as a study in 
which the available experimental units are assigned at random (either simply 
or restrictively) to the various treatments. More generally, if we consider 
each factor to be manipulated (e.g., several ways to teach arithmetic; sex; 
overt versus covert response in programmed instruction; or 100% reinforce- 
ment versus 50% reinforcement) as having two or more levels or categories 
(e.g., SMSG* curriculum versus two other ways to teach mathematics would 
constitute three levels of the teaching-of-mathematics factor), we can then 
talk about factor-level combinations, such as the six generated by three ways 
to teach mathematics crossed with two levels of sex (male-female). This is a 
3x2 factorial design — one factor at three levels crossed with a second 
factor at two levels. 

The basic experimental design involves one or more factors that are 
either manipulated by the experimenter, as illustrated above, or not manipu- 
lated by him (e.g., male or female sex, day of the week, height above average 
or below average). The levels of one factor may be crossed with those of 
another, or the levels of one or more factors may be nested within the levels 
of another factor. For instance, when three male raters and three female 
raters rate each of 10 ratees on each of seven traits, raters are nested within 
sex, because no male rater is also a female rater (i.e., the rater levels do 
not cross the sex levels, though they do cross the ratee and the trait levels). 
You have already seen examples of nesting in Chapter 18, including Prob. 7 
at the end of that chapter. 

Another example would be an experiment in which the manipulated 
factors were immediate reinforcement versus delayed reinforcement and 50% 
reinforcement versus 100% reinforcement, all levels of one factor crossing 
all levels of the other, with the experimental subjects (i.e., persons) classified 
as male versus female, blond versus brunette versus redhead, and volunteers 
versus nonvolunteers. The subjects would be nested within the 12 “nests” 
created by the intersection of sex with hair color with volunteering. One- 
fourth of the subjects in each nest would be assigned randomly to a given 
factor-level combination of the manipulated factors. 

Hierarchical nesting occurs when some factors are subclasses of others; 
e.g., cities could be nested within counties, counties within states, and states 
within regions. An experimenter might assign cities at random to experi- 
mental treatments. He would then have a partly nested and partly crossed 

* SMSG means "School Mathematics Study Group," a group composed of mathe- 
matics educators who pioneered in bringing modem mathematics to secondary-school 
courses in the United States. 



483 FUNDAMENTALS OF EXPERIMENTAL DESIGN 


CHAP. 19 


design, with the geographical units not crossing each other but crossing the 
levels of the treatment factor. One then has no basis for concluding that a 
certain city would have done relatively better in the experiment had it been in 
another state. 


Nesting and crossing can occur only when there are at least two factors. 
In a given study there may be no nesting, no crossing, all of one or the other, 
or a mixture of nesting and crossing, as in the above examples. Note, though, 
that nesting rarely occurs when all the factors are manipulated. 

The term “factorial design” is taken by some statisticians to mean a 
fully crossed set of two or more factors, with an equal or unequal number of 
experimental units assigned completely randomly to each of the factor-level 
combinations. (See Kendall and Buckland, 1957, p. 106.) For one factor, 
with random assignment ofn, experimental units to each of the J levels of the 
factor, there would be no crossing, of course. Other definitions of factorial 
design include cases where only restricted assignment of the experimental 
units is possible, as when the N available experimental units must be con- 
sidered as having been drawn from two or more different populations (e.g., 
men ^versus women) ; each such population defines a level of a nonmanipulated 

( S« S^ nfey. 1967. pp. 204-205. for an argument about whether 
randomized-block designs* are factorial designs. We shall see subse- 

JhTrestli’cled they . invo,vc fuU CT0ssin S °r factor levels, but with 

of the experiment^,., 


112 
AN EXAMPLE 


*>)!« of priming type crossed with thre “ penment 'giving four 
factorial design jicSg ?2 ^0 n '! S ’“ s ° f P ri "'i"S type, a 4 x 3 
“ith cxety size, and each size ‘ WrH each style is tried 

design. If we take t2 pupils and an' * St ** e ' This ’ s a CDm P lete 

combinations, we produce one reVorf n' “!, ran<, ° m 10 CaCh of ,hc 12 

cwign 24 pupil,, ,w 0 al ra „dom , ? of a ' 1 !'- as l a minimum, we shall 

create two replicates. In order to h, fthc combinations, and thereby 

s: — u-. 


number of replica leJ , 
determining n eziit. 


will wdd°,r pil5 “ cach com Nnation. Some 
>,C ' d ,he l»»“ « require. Method, for 
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How do we conduct this factorial-design experiment involving four styles 
crossed with these sizes? Of course, we must decide which styles and sizes 
to use. Probably we have firmly in mind four styles that are candidates for 
use in the textbook or test we plan to prepare. We also know which three 
sizes to try for our purposes— perhaps 8 point, 12 point, and 16 point. 
These four styles and three sizes produce 12 factor-level combinations that 
are the only ones of interest to us in this experiment. In the jargon of experi- 
mental design, we have two /xed-effects factors and therefore employ a 
fixed-effects model, because we have “drawn” the four styles from a target 
population of just four styles and the three sizes from a target population of 
just three sizes. 

Where do we get the experimental units with which to do the experi- 
ment? We might secure a “grab-group” consisting of the first \2n individuals 
who happen our way, or we might define a population of individuals, such as 
all fourth graders in a large school system, and draw I2n individuals at ran- 
dom from that population. Using a grab-group limits rigorous statistical 
generalization from the outcome of the experiment to just those 12n persons, 
whereas drawing the experimental units randomly from a population permits 
statistical generalization to that population, thereby increasing external 
validity, i.e., generalizability. (We may, however, be able to generalize 
nonstatistically from the members of the grab-group to other persons “like 
them” if we know enough about the adventitiously chosen individuals to be 
reasonably sure that none of their characteristics determining the outcome 
of the experiment differ enough from those of the target population to change 
the results there. This is difficult to ascertain, and in any event we have no 
probabilistic warrant for generalizing from the grab-group to anyone else 
whatsoever.) 

After we secure the I2« experimental units by one of the above methods, 
we randomly assign n of them to each of the factor-level combinations, which 
have common content but different combinations of style and size. We 
wish to vary only style and size. All other variables should be held constant 
(as for example, by using just males in the experiment, thereby keeping the 
sex factor at one level) or randomized over all 12 factor-level combinations. 
This is where experimental control becomes crucial. It might be practicable, 
for instance, to seat all 12n pupils randomly in the room. (That could be 
done by passing out random seat assignments at the door as the pupils 
arrived.) If a nonrandom room arrangement were used, this would have to 
be considered as part of the design, making it more complex than a 4 x 3 
factorial. Control of extraneous variables calls for great care and ingenuity 
so that the experiment will be internally valid, i.e., produce comparisons that 
are free from bias. 

After the 1 2n pupils have read the same passage with the same time limit 
under the same conditions except for style and size of printing type, they will 
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be given a common test to determine how much each learned from the 
particular combination of style and size of type to which he was exposed. 
The total score of each pupil on this outcome test will constitute the ob- 
servations on the dependent variable to be analyzed. 


19.3 

ADVANTAGES AND 
DISADVANTAGES OF THE 
FACTORIAL DESIGN 


Complete, balanced factorial designs (designs where each of the possible 
factor-level combinations occurs and has n a I experimental units assigned 
to it) permit testing more than one hypothesis about main effecls (e g the 
influence of type size or the influence of style) efficiently in the same exiteri- 
ment. They also make it possible, where two or more fetors are used, to 
study how the factors interact. Perhaps the least effective of the three sizes 
when combined with the least effective of the four styles does not produce the 
fast effecttve of the twelve factor-level combinations. If tl/cffects am 
additive, so that knowing the effectiveness of a certain size factor level and a 
Tombfef rMOr , ' Vd .r P" di « 'f'Ctiveness of that fetor level 

nountemc, ollnn w ?? ™ *ay that the two factors do 

not interact. One cannot study interaction statistically in one-factor studies 

of one-facU)rexperimerits Cli0n ^ ^ 

measurements f 'or “Tchtr oMhet^' f P- 

avoided (in the probabilistic sens!) by the mndom , ,”"" 5 ’ bia5 is 

mental units to the factor-level mmJ .• ° m ass, S nm cnt of the experi- 

the within-factor-level-combination variability ^to bt^aMtre b f nd ’ '* P 61711 ' 15 

variation among individuals treated alike nl, , S reat as Irue-score 

Thus the signal, (the genuine effects) mav til T™* ° f m ' asurei tient dictate, 
within-combination variability) if the sigMl-to-S" 1 by thc (,he 
(or error, as it is usually called) low ers the nn . se ral | 0 ls Iow This noise 
and increases the width of the confid f"? or of the significance tests used 
error can be lessened by vaL7,Lh„“ “ m P“'= d - Truc-score 

leveling, and covarying all of w hich den. “ b[aclcin S, stratifying, 
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19.4 

BLOCKING 


Of course, one can often lower within-factor-Ievel variability of the outcome 
measures by drawing the experimental units from a homogeneous sub- 
population— e.g., persons all the same age, sex, IQ, and socio-economic 
level. This may reduce “error” considerably, but it does so at the expense of 
limiting the generalizabitity of the findings to other persons of the same age, 
sex, IQ, and socio-economic level as the ones used in the experiment. It 
will usually be better to set up explicitly as factors in the experiment those 
characteristics thought likely to be most closely related to the outcome 
measure(s) of the study — i.e., to the dependent variable(s). This factorializa- 
tion will reduce error almost as well as would the subpopulation method. 
One can then test the interactions of the status variables with the manipulated 
variable(s) to determine whether or not the findings can be generalized over 
age groups, sexes, etc. Indeed, the thoughtful experiment designer often has 
his cake and eats it, too, as Ronald Fisher showed convincingly long ago. 

One of the earliest crossed classifications developed by Fisher for 
agricultural research was the randomized-block design. (See Fisher, 1925, 
pp. 226-29.) If V different varieties of wheat were to be planted in a field, he 
suggested that the field first be divided into B blocks of ground, the fertility 
within a given block being as homogeneous as possible. Then each block 
was divided into V plots, one for each variety of wheat. One variety was 
assigned at random to each plot within a given block, so that on each block 
all V varieties were planted once each. This, then, produced B x V factor- 
level combinations and BV observations when the wheat matured. See 
Table 19.1 for the layout of the observations. 

Note that varieties were assigned randomly to plots within blocks. 
Fisher showed that this randomization was crucial. Also, the B blocks in 
the experiment might be considered a random sample drawn from a hypo- 
thetical population of blocks “like these,” whereas the V varieties were 
probably the “target population " of varieties of wheat, i.e., all the varieties 
in which the experimenter was interested. This is a mixed model: random 
block effects and fixed variety effects. The E(MS )' s shown in Table 19.1 are 
such that 

MSrarfttittJMS Hloeii X reriHieii 

is distributed as 

under the null hypothesis that the V varieties produce equal quantities of 
wheat. 

By choosing homogeneous parts of the field (the blocks) rather than 
merely assigning the varieties wholly at random throughout the field, Fisher 
removed the bet ween- block variance from the mean square for error and 
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OUTUNt OF DAT* FROM A "ANDO^JJO^^^^vARCTB 



Variety of wheat 

, Block of ground 

1 2 

.. v 

1 I 

x» 

... X\r 

2 

X,r X, t 

. . Xir 



B 

I Xbx x„ 

... X.T 

Source of variation 

* . 

E{MS) 

Between blocks (a) 1 

Between varieties 0) 
Blocks X varieties (of?) 1 

B- 1 

V — 1 

tfi — IKK — 1) 

o* + Va\ 

o» + o) t + 
o' + <Z t 


increased the power of the significance test for varieties. He was not much 
interested in MS bMt , except that it should be as large as possible under the 
conditions of a particular experiment. The general principle of homogeneous 
subsorting of experimental material can be extended beyond agriculture to a 
number of situations of great interest to behavioral scientists. In some of 
these, blocks cannot reasonably be considered a random-effects factor. Also, 
in many of them there will be more than one observation per factor-level 
combination. (Fisher allowed for that in some situations, but the more 
plots there were per block, the less the within-block homogeneity was in 
many agricultural situations.) 

Our discussion below is based on three scales of measurement'. nominal- 
scale factor, which we call a blocking variable; ordinal, which we call 
stratifying ; and interval or ratio, which we call leveling . These three terms 
are our own coinage for this situation. We believe that they serve behavioral 
science more usefully than does the single, ancestral expression “randomized- 
block design.” Blocking, stratifying, and leveling variables are classificatory, 
rather than manipulated. They are antecedent to the beginning of the experi- 
ment itself, as for example when the field was divided into blocks before the 
varieties were planted on plots within these blocks. Let us now consider each 
of these three types of factorialization that arc useful for reducing error and 
improving generalizationability. 
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Blocking on a Nominal-Scale 
Variable 


A familiar example of blocking is the classification of each pupil as beingeither 
male or female and the introduction of this two-level factor explicitly into the 
experimental design. Physiological sex is not a manipulated variable, but in 
the experimental design it is treated statistically like the manipulated factors. 
With two levels of sex, three levels of type size, and four levels of type style 
one has 2 x 3 x 4 — 24 factor-level combinations and needs 24 n experi- 
mental units. 

Main effects of sex, size, and style can be estimated. Also, one can 
study the interaction of sex with size, sex with style, and size with style, and 
the three-factor interaction of sex, size, and style. If the main effect of sex is 
significant, then by having a sex factor one has reduced the error variance 
significantly. If sex interacts with either or both of the manipulated factors, 
then by having sex as an explicit factor one has learned how to limit his 
generalizations appropriately. For example, it might be discovered that 
women find Style 3 easiest to read, whereas men find Style 1 easiest. When 
pursued further, this might have practical consequences for the design of 
textual materials. 

Another example of blocking is the use of identical twins in an experiment 
where one factor is manipulated at two levels. Twin A of each pair is assigned 
at random to a level of the manipulated factor, and twin B of that same pair 
then receives the other level. With P pairs of twins, one has 2P experimental 
units. The three sources of variation are between twin pairs, between the 
two treatments, and interaction of pairs with treatments. Note that this is 
just one replicate, so no direct statistical test of the interaction is afforded. 

If the variation among the twin pairs is significant, one has reduced the error 
term significantly, but because one sacrifices half his degrees of freedom for 
error in so doing, the power of the statistical test of the two-level treatment 
effect may not be improved or may even be lessened. In this design, the 
factor “twin pairs” would almost surely be regarded as random, since one 
would probably want the comparison of A and B to be generalized to a 
population of twins. Incidentally, this is a social- or biological-science 
version of Fisher’s randomized-block design. The twin pairs are unordered, 
having been “measured” on a nominal scale. 


Stratifying on Ordinal-Scale 
Variables 


We call an ordinal-scale variable used as an explicit factor in the experi- 
mental design, such as socio-economic status of each experimental unit, a 
stratifying variable . There might be five levels, such as high, upper-middle. 
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middle-middle, lower-middle, and low socio-economic status, creating a five- 
level classificatory (i.e., not manipulated) factor. 


Leveling on Interval- or 
Ratio-Scale Variables 

An interval or nearly interval scale or a ratio scale can be used to yield what 
is called a leveling variable. To reduce within-factor-level-combination 
variability one may group the experimental units before the experiment begins 


TABLE 19.2 SCHEMA OF DATA FOR THE LEVELING DESIGN, WITH REPLICATION: X, u 

NOTATION. / - 1.2 L LEVELS. I = 1.2 T TREATMENTS. AND / = n 

INDIVIDUALS FOR EACH TREATMENT WITHIN EACH LEVEL* 
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on something, such as measured reading comprehension or height, that is 
expected to correlate well within treatments with the outcome measure of the 
experiment. If there are T levels of a treatment factor and LT experimental 
units, one would arrange the experimental units from highest to lowest on the 
pre-measured factor into L levels. Within each such level, one experimental 
unit would be assigned at random to each of the T treatments (i.e., the T 
levels of the manipulated variable), creating one replicate of an L x T 
design. If the measures of the leveling factor do correlate significantly 
greater than zero with the outcome measures, then (as in the twin design 
outlined above) the within-treatment variability will be reduced significantly. 

Alternatively, one might choose to use N = nLT experimental units, 
where n is greater than I. (In the above paragraph, n = I.) Then one 
would group the N experimental units, from highest to lowest, into L ~ 
NfnT sets, and would assign at random n experimental units to each treat- 
ment within each level. This would permit testing the interaction of levels 
with treatments, which the n ~ 1 design does not allow directly. See Table 
19.2 for an outline of this design. 


19.5 

ORDERED LEVELS OF FACTORS 

If one has three equally spaced sizes of printing type, such as 8, 12, and 16 
point, he has three equally spaced levels of an ordered factor. A significant 
trend for size of type might be linear, representing an equal increment (or 
decrement) as one goes from 8 point type to 12 point type and from 12 point 
type to 16 point type. The trend might be quadratic (i.e., second-order), as 
when 8 and 16 are equally effective but 12 is much better, or it might combine 
both linear and quadratic components. 

Style of printing type, you note, is a nominal-scale factor, not ordered. 
It is quite possible to have two or more ordered factors in the same study, 
however, as for example if one introduced five equally spaced weights ofpaper 
into the print study; the various trends that result could then be evaluated. 
(E.g., see Winer, 1962, and Edwards, 1968.) 


19.6 

RANDOM SELECTION OF 
FACTOR LEVELS 

Earlier in this chapter it was hinted that the four styles of printing type might 
have been drawn at random from a larger target population of printing styles 
to which one wished to generalize. If that population contained, say, 40 
styles, then the four drawn would be 10% of the entire population, a small but 
hardly negligible percentage. If, on the other hand, one drew four schools 
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out of a target population or 4000 schools, the one-tenth or I % that the 
schools in the experiment constitute or the population would be tiny, so one 
might choose to consider that essentially the four schools had been drawn 
lrom an infinite population ofi schools, in which case one would (using the 
jargon ofi the field) say that the schools arc a random-effects factor . 

irone has both fixed-effects and random-edicts raclors in an experiment, 
we say that he should use a mixed-model analysis or his results. (See Chapter 
18 for the mixed-effects ANOVA model.) Genuine random-effects factors 
seem rare in educational and psychological research, but often we choose to 

“5 e fl! °a 3 raC '° r “ Ch “ ,M,:htrS ’ SCh0 ° l! - or 

S fcvcls We'd T" u and0m ' y fr ° m a virll ’ ally M " i “ P°P“ lali °" o f 
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be easier than that of the analogous "natural experiment.” A more familiar 
example than the occupational one is investigation of the effects on general 
English vocabulary of studying Latin in high school. If students elect (i.e., 
volunteer) to take Latin or not to take it, the inputs for the two conditions 
will almost always be substantially different, the students taking Latin being 
better initially on English vocabulary, IQ, and a host of other cognitive and 
affective variables. If, however, half the prospective enrollees in Latin could 
be assigned at random to Latin in the ninth and tenth grades and the other 
half in, say, the eleventh and twelfth grades, nonreactively so that dis- 
appointment and frustration did not upset the operation of the school, it 
should be possible to compare both groups unbiasedly at the end of the tenth 
grade, after half had completed two years of Latin and the other half had 
taken none. Because of the random assignment, there would be no systematic 
confounding of any antecedent variables with the experimental variable (i.e., 
took Latin versus did not take it). Results should be much more readily 
interpretable than in the natural experiment. 

One has only to recall the great difficulties encountered by statisticians 
when analyzing the results of the vast natural-smoking experiment that has 
been going on for many years. Is cigarette smoking one of the potent 
"causes” of lung cancer? Of other ills? After much comparison of human 
subgroups and animal experimentation in order to discredit plausible 
alternative hypotheses, most researchers in this area have concluded that 
smoking cigarettes does increase the probability that a person will develop 
lung cancer and have certain other ailments, but because the work with 
humans was not controlled experimentation, no proof overwhelmingly 
convincing to all intelligent persons has yet been provided. Indeed, associ- 
ational analyses cannot eliminate all plausible alternative hypotheses, whereas 
controlled experiments if conducted impeccably can rule out all systematic 
ones, leaving only chance fluctuations (usually of quite low probability and 
largely under the experimenter's control) as the alternative explanation. This 
is not to say that a single experiment can be definitive or perfect; none ever is. 
Often an experiment will raise more new questions than it answers old ones, 
but at least the process of randomized assignment of experimental units to 
factor-level combinations removes the chief source of systematic bias that 
afflicts most natural experiments. 

Control may exact a high price in terms of lowered external validity 
(i.e., generalizability), however. For example, one probably cannot assign 
persons at random to unidentified-flying-object (UFO) versus non-UFO 
clubs and preserve the sense of the distinction as it occurs naturally. One 
might try paying some persons to smoke and others to refrain from smoking, 
but very likely this would not simulate well enough the natural situation 
where persons of certain temperaments and backgrounds cannot resist 
smoking several packages of cigarettes daily, whereas other types of individuals 



498 


FUNDAMENTALS OF EXPERIMENTAL DESIGN 


CHAP. 19 


arc not tempted. One cannot very well assign occupations randomly in a 
meaningful way. Even assigning Latin versus no Latin to eligible high-school 
students by deferring this subject for half of them has never been done, so 
far as we are aware. However, this technique of delaying treatment for a 
randomly selected group of control subjects has been employed frequently in 
evaluating the elTects or psychotherapy. 

But in general, researchers in the behavioral and social sciences and 
education have not made use of the simple and powerful expedient of random 
assignment t° comparison groups even when to do so would be uncomplicated. 
hand'JilT Se ' m * a " ) ' experiments involving cursive 
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Fisher developed the analysis of variance and covariance as a procedure for 
analyzing the results from factorial designs of many types. Most reluctantly, 
we have not discussed the analysis of covariance (ANCOVA) in this textbook 
because to have done so systematically and thoroughly would have required 
many pages and made an already-long book appreciably longer. For 
experimentation, the basic principle of ANCOVA is that there are measures 
of one or more antecedent variables, i.e., measures secured before the random 
assignment of experimental units to treatments is made. These are chosen 
with the hope that the regression of the outcome measures on these antecedent 
measures will be considerable {j.e., that the linear association of the X's, the 
premeasures, with the l"s, the postmeasures, will be appreciable). 

In effect, an analysis of variance is performed on the (F — f)’s, where 
the t's are predicted from the X's in the usual b x X -f b 0 way described earlier 
in this book. Some complexity arises because, even with only a single 
antecedent measure, one has at least two levels of the treatment factor, and 
hence at least two columns of F’s. With two or more antecedent measures 
multiple-regression methods must be used, because one is securing the best- 
weighted composite of the several antecedent variables for predicting the F’s. 

A hypothetical illustration of use of the analysis of covariance may make 
its purpose clearer. Suppose that one is studying four different ways to teach 
computation of one-way ANOVAs. As his antecedent variables the 
experimenter has, for each person later to be used in the experiment, a verbal- 
aptitude score (V), a quantitative-aptitude score (Q), and previous grade- 
point average (G) in quantitative courses. He assigns n } (preferably n = 
iV/4) of the persons randomly to each of the four teaching methods, without 
any reference to their test scores or GPAs. Then he carries out the experiment 
and at its end administers to all persons a test of computing one-way ANOVAs; 
this yields the outcome measures, y’s. Finally, he performs an analysis of 
covariance on the data to determine whether, after he has used the ante- 
cedent information statistically, the adjusted means of the four methods differ 
significantly. If the regression of the final-test scores on the three predictors is 
significant, his adjusted vv/rAm-method mean square will be smaller than if he 
had not used this antecedent information, thus giving him a more powerful 
significance test and permitting smaller confidence limits to be constructed 
around differences between method means. The data layout is sketched in 
Table 19.3. 

For details of such analyses (especially of the case where there is just one 
predictor variable) see Edwards (1968), Brownlee (1965, pp. 376-96), Winer 
(1962), McNemar (1962), Lindquist (1953), and Scheffe (1959). Also, for a 
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fair dice and independent tosves-of 15 straight “p.ivscv" in crapv it irrelevant 
in a crap game with loaded dice. Similarly, ilic prob.ibitnv of a type ! error 
may be substantially incorrect if the data of the ex perimenl do not conform to 
the statistical-mathematical model from which that probability was derived 
We are here concerned primarily with the asstimpt.on of independence of the 
replications of a comparative experiment. 


Illustration of the Invalidating 
Effect of Nonlndependenee of 
Replications of an Experiment 
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represents an Independent opportunity far the occurrence of an event (namely, 
that pupil B exceeds pupil A) that has probability 1/2. Actually, each matched 
pair is an experiment: one pupil is taught by method A ; the other, by method 
B. One pair of pupils is an unreplicated experiment. Observation of the 
relative superiority of methods A and B on several pairs of pupils constitutes 
replication of the experiment. There are as many replications of the experi- 
ment as there are matched pairs whose performance is observed. Our 
researcher made the assumption— gratuitous, perhaps-that the replications 


of his experiment were independent. 

Let us examine this assumption by digging beneath the outward 
appearances of the experiment-the obtained data-down to the dynamics of 
the experiment that produced the data. Is it plausible that if pupil B scored 
higher than pupil A in pair I that this does not affect the chances of pupd B 
scoring higher than pupil A in pair 2? As most classrooms are now con- 
stituted it does not seem plausible. Recall that the 120 pupils were placed 
into four separate and intact classrooms. Anyone who knows anything about 
instruction knows that the members of an intact class interact during 
instruction in a way that enhances or interferes with the learning in the group. 

Again let us exaggerate so as to make the point more clearly. In one 
sense we will exaggerate by describing a sort of nonindependence of replica- 
tions that is far stronger than any that would be met in practice. In another 
sense we will exaggerate, or “fantasize” might be a better word, by assuming 
that the researcher knows how the nonindependence operates, which he 
seldom will know. Suppose that in the group or 120 pupils there are two 
troublemakers who are so obnoxious and disruptive that they will success- 
fully inhibit the instruction in any class of which they are a part. These two 
bait their teachers, annoy their classmates, and generally cause a ruckus. If 
a method A classroom has one of these troublemakers and the matched 
method B classroom does not, he will so depress the scores on the final pro- 
ficiency test that each of his classmates in method A will perform more 
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Outcome 1 
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happen to be assigned to tbe'sam^Ha 3 ' b °' h orihts « troublemakers should 
monter made certain that one of them wmiTri"'*™" r ° rbid! >. our experi- 

assroom I or I land the other to cither d ass.gned at random to either 

h=a ss , E „me„t„f thcpllpiIsi Classroom III or IV. In any event, 

™“ lhat tbe comparison was not ° a,tbcr melfl °d A or B was made at 
hghtofrvhatwc now lm„J . '"""Uonally biased. 

' he data ’ let us calculate them' ! h '“ ,ivi,i ' s in Ihc classrooms that 
hE - Z " ,c,l,od B and „ 0 J Z ° bab 'hty of the obtained resuIls-bO 
* C ,n,Z Z -*« -fc null 

aker to either method A or me.h aKI S n ' I1 ="l of the single 

' mtIh ° d B r “ classrooms I and II, there 
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is probability 1/2 that he will be assigned to classroom I (and cause its 30 
pupils to do more poorly than their 30 matched pairs in classroom II) and a 
probability of 1/2 that he will be assigned to classroom II. The same con- 
siderations apply to the assignment of the second troublemaker to either 
classroom III or classroom IV. The random assignments of the two trouble- 
makers to classrooms are independent. Hence there are four possible 
outcomes of the experiment (see Fig, 19.2). (Recall that the presence of a 
troublemaker in a class will cause the 30 members of that class to do more 
poorly than their matched partners in the class receiving the other treatment.) 

Each of the four outcomes is equally likely. The probability of any one 
of them is 1/4. In particular, outcome 1 will occur one-fourth of the time 
this same experiment is executed. Consequently, given that methods A and B 
are equally effective, the probability is not (1/2)*° that the B pupil will perform 
better than the A pupil in all 60 matched pairs (as our researcher originally 
calculated) ; the probability of this overwhelming apparent superiority of method 
B is 1/4 when, in fact, A and B are equally effective. The probability of falsely 
concluding that method B was superior to method A was 1/4 instead of 
(1/2)*°. In fact, the experiment was no more sensitive to the discovery of 
superiority of one method than an experiment in which only two independent 
matched pairs of pupils were used. 

The researcher was so completely in error in his first analysis of the 
probability of a false conclusion because he failed to recognize that con- 
ducting his experiment with intact classrooms did not produce 60 independent 
replications of his experiment. 

Experimental Unit Versus Unit 
of Statistical Analysis 

In the analysis of experiments a distinction must be made between the unit of 
statistical analysis and the experimental unit. Before valid probability 
statements can be made about types of errors, these two units must coincide, 
i.e., the statistical analysis must be carried out on the legitimate experimental 
units. 

Definition: The units of statistical analysis are the data (the 
actual numbers) that we consider to be the out- 
comes of independent replications of our experi- 
ment. If you will, the units of statistical analysis 
are the numbers that we count when we count up 
degrees of freedom “within” or “for replications. 

Imagine that 10 pupils study method I and ten pupils study method II. 

A /-test is performed on the 20 scores on the dependent variable; hence, the 
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‘‘pupil” is the unit of statistical analysis. For the sake of analysis, each pupil 
is considered to be a replication of the experiment; method I is replicated 10 
times, and method II is replicated 10 times. 

By looking at a researcher’s statistical analysis, we can easily determine 
which unit he chose as the unit of statistical analysis. 

n>enal i uri t m ' What ^ d ' ir ' CU ' t “ st2,c an adc <I u »e'<lefinition of the experi- 

Definition: The 'xpcimenlal mil, are Ihe smallest divisions of 
the collection of experimental subjects that have 
been randomly assigned to the different conditions 
in the experiment and that have responded inde- 
pendently of each other for Ihe duration of the 
experiment. 

(This definition clearly reflects the faet •, 

variable. The probability that subieeM? n \ easurcd on a P ar ‘iculardepcndent 
the dependent variable than subi rt t !? der meIbod A score higher on 
are randomly assigned to *“hfd T “! h ° d “ ” '' 2 ' vh '" ■>"*«. 
subject 2 under B, i e the se ° „a r Sub j M 2 Und " ■< a " d 

probability that subject' 2 under method T'm ° f “P™”*" 1 - lf >l"= 
under method B i, 1/2 regardless . !COre l,igher lhan sub j ect 2 

first replication (with subjects I) then th 2,C0me ? f lbc experiment on the 
Judging the validity of the assuml.* 0 r fP llcatl0ns are independent, 
experiment are independent is no easv man" **21. the re P licati °ns of an 
geneous variances is easily tested with^R The assum P tion of homo- 

normality assumption is met) or HlV! . *! teS ± (ifwc are confident the 
”tal.ty is easily tested with a chL™! ,Bt ' Th ' assumption or nor- 
“ ''asonably large. However thTL "if Ko,m °S°rov-Smirnov test if n 
task of making a considered judgment r C ,b er n Vl11 bc faced wilh ,hc 
replications rather than the ,i k * f '"‘° r lhc de S r 'e of independence of the 
J dgtnent must be based on an inlima/'H 8 2 part ' cular statistical test. His 
experimental setting. sonle . knowledge of the dynamics of the 

J ’’ ri Ions Wl11 be impossible to overlo’ 2t2at noninde pendence or the 
pe iments, nonindependence nr t -° 0b ’ as in our example. In other 
unrecognired. P ' nd '"« tepl, cation, wi ,l be subtle' and can go 

tn the example at the b..- 

y assigned to methods A and R ° f thls grou P of subjects that was 
s rt and B was an individual pupil. yIom , 
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the individual pupils did not respond to instruction independently. In two 
classrooms, a troublemaker caused the remaining pupils to learn nothing 
about balancing chemical equations. 

Each classroom did respond independently of the other three classrooms, 
we might assume; and the classrooms were randomly assigned to methods A 
and B. Therefore, the “classroom” is the smallest division of the 120 
pupils that satisfies the conditions of random assignment and independent 
responding. Therefore, the “classroom” is the experimental unit in our 
example. 

A valid analysis of the illustrative experiment would be carried out on 
four observations: the average proficiency test scores of the four classrooms. 
In other words, a valid analysis would use the four classroom means as the 
units of statistical analysis. The experiment comparing methods A and B was 
only replicated twice (once for each classroom) for each method. “Class- 
room” is the experimental unit; hence, the classroom means musi be the 
unit of statistical analysis. A valid statistical inferential analysis is possible, 
but it is hardly worth the bother. The results of the two replications of the 
experiment give us no more evidence to conclude that method B is better 
than method A, than we have evidence to conclude that a coin is biased in 
favor of “heads” because the first two times we flipped it we got “heads.” 
(However, means based on 30 pupils each should be more stable than means 
based on one pupil each, i.e., the experimental results should be better for 
the 120 pupils than they would have been for just four, one in each class.) 

Educational researchers are especially prone to making the error of 
analyzing data in terms of units other than the legitimate experimental unit. 
Almost all of the comparative experiments carried out under actual school 
conditions face the same dilemma. At most, no more than five or six intact 
classrooms have been involved in the experiment. Perhaps pupils have been 
assigned to classrooms at random, perhaps not. At least, the classrooms 
have been assigned at random to the experimental conditions being compared. 
The researcher has two alternatives, though he is seldom aware of the second 
one: (1) he can run a potentially illegitimate analysis of the experiment by 
using the “pupil” as the unit of statistical analysis, or (2) he can run a legiti- 
mate analysis on the means of the five or six classrooms, classroom being 
the actual experimental unit, in which case he is almost certain to obtain 
statistically nonsignificant results (with only five or six replications, the power 
of his significance test is low). 

If the researcher chooses the first alternative and is led into error, he can 
find solace in the knowledge that methodologists themselves have long 
sanctioned liis actions either explicitly or by example. 

As early as 1940, Lindquist presented a legitimate analysis of variance 
of data gathered from an experiment involving intact groups (see Lindquist, 

1940 pp. 107if.). Lindquist’s thinking was far ahead of that of his colleagues. 
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if McNcmar was representative of the latter. 1 n a review of Lindquist's text 
McNemar (1940) wrote as follows: 

We next raise a puzzling question for which we have no definite answer. 
Beginning on page 101, Ihc analysis of variance technique is applied to an 
educational-methods experiment involving five schools and three methods, 
with twenty pupils in each of fifteen classes. The analysis is carried through on 
the basis of the fifteen class means in such a way that neither the number of 
pupils nor the pupil variation enters into the analysis. This at first struck the 
reviewer as being indefensible, but a few pages later (p. 117) one finds that 
the pupils have their inning via the ‘within classes’ variation, but this latter 
variation is not utilized because the 'interaction variance' is larger than the 
•within classes' variance. The reviewer suspects that something is wrong 
with a test of significance which does not involve the variation of the in- 
dividuals upon which the means arc based. We are unable to locate the 
fallacy here, if there be such, but we have in mind a worked-out example in 
sex differences which shows no significant difference in length at birth when 
analyzed by the variance technique using means, but which yields highly 
significant differences when analyzed by the ordinary critical ratio procedure. 
Wc arc not arguing that the author is wrong in arguing that intact groups arc 
the proper sampling units, but rather that the case is not convincingly stated. 

Notice that McNemar’s worked-out example, which he considered to be 
contrary to Lindquist's example, is quite unlike a classroom experiment of 
the sort Lindquist discussed. What if McNemar's example involved 20 boys 
(four from each of five families) and 20 girls (four from each of five families)? 


Further Reading 

Fortunately, there now exist about a hair-dozen discussions of the problem 
of determining the appropriate experimental unit and unit of statistical 
analysis'. 


1. Campbell, D. T. and J. C. Stanley. “Experimental and quasi- 
expcrimental designs in research on teaching.” Chapter 5 in 
llmdbmk of Research on Teaching, ed. N. L. Cage. Chicago: 
Rand- McNally, 1963. Relevant pages: m.passime 

2 ' CrlV™. V .!. d c f E *P' riments. New York: Wiley, 1958. 

deal 2 vi' *96. The designated pages 

cOTtest k I" '' 1 ?,'’ ''? ° ! ex P er ' mcnt 'ng with intact groups. The 
Lh fcvrf s? 1 "['"'■"tattalion and the discussion is at a 

SSXSrtf ont r “ d af,cr * ou hav ' — * — 

1 "“'to,’ r’L' A ' A - U,m5dai,lc - and F. D. Sheilield. Experiments 
Press, Tsw ™"”"' N.J.: Princeton University 
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4. Lindquist, E. F. Design and Analysis of Experiments in Psychology 
and Education. Boston: Houghton Mifflin, 1953, Relevant pages: 
192-93, A short discussion of experimenting with intact class- 
rooms. Lindquist discusses the problem in an educational context. 
An important reference. However, it must be read carefully and 
critically. When Lindquist prepared this textbook, the procedures 
we presented for finding E(MS)'s had not yet been worked out 
fully. 

5. Lumsdaine, A. A. “Instruments and media of instruction.’' 
Chapter 12 in Handbook of Research on Teaching, ed. N. L. Gage. 
Chicago: Rand-McNalfy, 1963. See especially pages 656-59. 
Pages 656-59 are probably the best readily available discussion of 
this crucial problem of running comparative experiments with intact 
groups of persons instead of individual persons. 

6. Peckham, Perc D., Gene V. Glass, and Kenneth D. Hopkins. 
“The experimental unit in statistical analysis: comparative experi- 
ments with intact groups.” Journal of Special Education, 3 (1969). 
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SOOO RANDOM DIGITS 


99 11 04 61 
38 55 SO 55 54 

” 54 67 37 04 

32 64 35 28 61 

69 57 26 87 77 

24 12 26 65 91 

61 19 61 02 31 

30 53 22 17 04 

03 78 89 75 99 

22 86 33 79 


93 71 61 68 94 
32 88 65 97 80 
92 05 24 62 15 
95 81 90 68 31 
39 51 03 59 05 

27 69 90 64 94 
92 96 26 17 73 
'0 27 41 22 02 
35 86 72 07 17 
85 78 34 76 19 


66 08 32 
08 35 56 
55 12 12 
00 91 19 
14 06 04 

14 84 54 
41 83 95 
39 68 52 
74 41 65 
53 15 26 


510 


46 53 
08 60 
92 81 
89 36 
06 19 

66 72 
53 82 
33 09 
31 66 
74 33 


84 60 95 82 32 
29 73 54 77 62 
59 07 60 79 36 
76 35 59 37 79 
29 54 96 96 16 

61 95 87 71 00 
17 26 77 09 43 
10 06 16 88 29 
35 20 83 33 74 
35 66 35 29 72 


88 61 
71 29 
27 95 


81 91 61 
92 38 53 
45 89 09 
80 86 30 05 14 
33 56 46 07 80 


90 89 
78 03 
55 98 
87 53 
16 81 


97 57 54'*- 
87 02 67 
66 64 85 
90 88 23 
86 03 11 


<N ™ Y “ k; 





* 60 36 59 46 53 

83 79 94 24 02 

32 96 00 74 05 

•9 32 25 38 45 

II 22 09 47 47 


31 75 15 72 60 
88 49 29 93 82 
30 93 44 77 44 
22 88 84 88 93 
7 8 21 21 69 93 


41 84 98 45 47 
46 35 23 30 49 
11 08 79 62 94 
52 70 10 83 37 
32 27 53 68 98 


20 85 77 3 1 56 
15 63 38 49 24 
92 69 44 82 97 
27 61 31 90 19 
38 68 83 24 86 


25 16 30 18 89 
65 25 10 76 29 
36 81 54 36 25 
64 39 71 16 92 
04 31 52 56 24 


83 76 16 08 73 
14 38 70 63 45 
31 32 19 22 46 
72 47 20 00 08 
05 46 65 53 06 


35 07 53 39 49 

56 62 33 44 42 

36 40 98 32 32 

57 62 05 26 06 
07 39 93 74 08 

68 98 00 53 39 
14 45 40 45 04 
07 48 18 38 28 
27 49 99 87 48 
35 90 29 13 86 

46 85 05 23 26 

69 24 89 34 60 
14 01 33 17 92 
56 30 38 73 15 
81 30 44 85 85 

70 28 42 43 26 

90 41 59 36 14 
39 90 40 21 15 

88 15 20 00 80 

45 13 46 35 45 

70 01 41 50 21 

37 23 93 32 95 

18 63 73 75 09 

05 32 78 21 62 

95 09 66 79 46 

43 25 38 41 45 

80 85 40 92 79 

80 08 87 70 74 

80 89 01 80 02 

93 12 81 84 64 


42 61 42 92 97 
34 99 44 13 74 
99 38 54 16 00 
66 49 76 86 46 
48 50 92 39 29 

15 47 04 83 55 
20 09 49 89 77 
73 78 80 65 33 
60 53 04 51 28 

44 37 21 54 86 

34 67 75 83 00 

45 30 50 75 21 
59 74 76 72 77 

16 52 06 96 76 
68 65 22 73 76 

79 37 59 52 20 
33 52 12 66 65 
59 58 94 90 67 
20 55 49 H 09 

59 40 47 20 59 

41 29 06 73 12 
05 87 00 11 ,19 
82 44 49 90 05 
20 24 78 17 59 
48 46 08 55 58 

60 83 32 59 83 
43 52 90 63 18 
88 72 25 67 36 
94 81 33 19 00 
74 45 79 05 61 


01 91 82 83 16 
70 07 11 47 36 
11 13 30 75 86 
78 13 86 65 59 

27 48 24 54 76 

88 65 12 25 96 
74 84 39 34 13 

28 59 72 04 05 
74 02 28 46 17 

65 74 11 40 14 

74 91 06 43 45 
61 31 83 18 55 
76 50 33 45 13 
11 65 49 98 93 
92 85 25 58 66 

01 15 96 32 67 
55 82 34 76 41 

66 82 14 15 75 
96 27 74 82 57 
43 94 75 16 80 

71 85 71 59 57 

92 78 42 63 40 

04 92 17 37 01 
45 19 72 53 32 
15 19 11 87 82 

01 29 14 13 49 

38 38 47 47 61 

66 16 44 94 31 

54 15 58 34 36 

72 84 81 18 34 


98 95 37 32 31 
09 95 81 80 65 
15 91 70 62 53 
19 64 09 94 13 
85 24 43 51 59 

03 15 21 91 21 
22 10 97 85 08 
94 20 52 03 80 
82 03 7! 02 68 

87 48 13 72 20 

19 32 58 15 49 
14 41 37 09 51 
39 66 37 75 44 
02 18 16 81 61 

88 44 80 35 84 

10 62 24 83 91 
86 22 53 17 04 

49 76 70 40 37 

50 81 69 76 16 
43 85 25 96 93 

68 97 11 14 30 
18 47 76 56 22 
14 70 79 39 97 
83 74 52 25 67 
16 93 03 33 61 

20 36 80 71 26 
41 19 63 74 80 
66 91 93 16 78 
35 35 25 41 31 
79 98 26 84 16 


f » 87 24 84 
81 61 61 87 II 
07 58 61 61 20 

90 76 70 42 35 

40 18 82 81 93 

34 41 48 21 57 
® 43 97 53 63 
67 04 90 90 70 
79 49 50 41 46 

91 70 43 05 52 

18 82 00 97 
„ 58 54 97 
73 18 95 02 07 
73 76 87 64 90 
54 °1 64 40 56 


82 47 42 55 93 
53 34 24 42 76 
82 64 12 28 20 
13 57 41 72 00 
29 59 38 86 27 

86 88 75 50 87 
44 98 91 68 22 
93 39 94 55 47 
52 16 29 02 86 
04 73 72 10 31 

32 82 53 95 27 
51 98 15 06 54 
47 67 72 62 69 
20 97 18 17 49 
66 28 13 10 03 


48 54 53 52 47 
75 12 21 17 24 
92 90 41 31 41 
69 90 26 37 42 
94 97 21 15 98 

19 15 20 00 23 
36 02 40 08 67 
94 45 87 42 84 
54 15 83 42 43 
75 05 19 30 29 

04 22 08 63 04 

94 93 88 19 97 
62 29 06 44 64 
90 42 91 22 72 
00 68 22 73 98 


18 61 91 36 74 
74 62 77 37 07 
32 39 21 97 63 
78 46 42 25 01 
62 09 53 67 87 

12 30 28 07 83 
76 37 84 16 05 
05 04 14 98 07 

46 97 83 54 82 

47 66 56 43 82 

83 38 98 73 74 
91 87 07 61 50 
27 12 46 70 18 
95 37 50 58 71 
20 71 45 32 95 


18 61 II 92 41 

58 31 91 59 97 
61 19 96 79 40 
18 62 79 08 72 
00 44 15 89 97 

32 62 46 86 91 
65 96 17 34 88 
20 28 83 40 60 

59 36 29 59 38 
99 78 29 34 78 

64 27 85 80 44 
68 47 66 46 59 
41 36 18 27 60 
93 82 34 31 78 
07 70 61 78 13 
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08 35 86 99 10 
28 30 60 32 64 
53 84 08 62 33 
91 75 75 37 41 
89 41 59 26 94 

77 51 30 38 20 
19 50 23 71 74 
21 81 85 93 13 
51 47 46 64 99 
99 55 96 83 31 

33 71 34 80 07 
85 27 48 68 93 
84 13 38 96 40 
56 73 21 62 34 
65 13 85 68 06 

38 00 10 21 76 
37 40 29 63 97 
97 12 54 03 48 
21 82 64 11 34 
73 13 54 27 42 

07 63 87 79 29 
60 52 88 34 41 
83 59 63 56 55 
10 85 06 27 46 

39 82 09 89 52 

59 5 8 00 64 78 
38 50 80 73 41 
30 69 27 06 68 
65 44 39 56 59 
27 26 75 02 64 

91 30 70 69 91 
68 43 49 46 88 
48 90 81 58 77 
06 91 34 51 97 
10 45 5J 60 19 

12 88 39 73 43 
21 77 83 09 76 
19 52 35 95 15 
62 24 55 26 70 
M 38 44 73 77 

33 *3 34 13 77 
24 63 73 87 36 
83 08 01 24 Jt 
16 44 42 43 34 
60 79 01 81 57 


78 54 24 27 85 
81 33 31 05 91 
81 59 41 36 28 

61 61 36 22 69 
00 39 75 83 91 

86 83 42 99 01 
69 97 92 02 88 
93 27 88 17 57 
68 10 72 36 21 

62 53 52 41 70 

93 58 47 28 69 
It 30 32 92 70 
44 03 55 21 66 
17 39 59 61 31 

87 64 88 52 61 

81 71 91 17 II 
01 30 47 75 86 
87 08 33 14 17 
47 14 33 40 72 
95 71 90 90 35 

03 06 11 80 72 
07 95 41 98 14 
06 95 89 29 83 
99 59 91 05 07 
43 62 26 31 47 

75 56 97 88 00 
23 79 34 87 63 
94 68 81 61 27 

18 28 82 74 37 

13 19 27 22 94 

19 07 22 42 10 

84 47 31 36 22 
54 74 52 45 91 
42 67 27 86 01 

14 21 03 37 12 

85 02 76 11 84 
38 80 73 69 6! 
« 12 25 96 59 

35 58 31 65 63 
07 50 03 79 92 

36 06 69 48 JO 
74 38 48 93 42 
31 99 22 28 15 
36 15 19 90 7j 
57 17 *6 57 62 


13 66 15 88 73 

40 51 00 78 93 

51 21 59 02 90 

50 26 39 02 12 

12 60 71 76 46 

68 41 48 27 74 

55 21 02 97 73 

05 68 67 31 56 

94 04 99 13 45 

69 77 71 28 30 

51 92 66 47 21 

28 83 43 41 37 

73 85 27 00 91 

10 12 39 16 22 

34 31 36 58 61 

71 60 29 29 37 

56 27 11 00 86 

21 81 53 92 50 

64 63 88 59 02 

85 79 47 42 96 

96 20 74 41 56 
59 17 52 06 95 
05 12 80 97 19 
13 49 90 63 19 
64 42 18 08 14 

88 83 55 44 86 

90 82 29 70 22 
56 19 68 00 91 
49 63 22 40 41 
07 47 74 46 06 

36 69 95 37 28 
62 12 69 84 08 

35 70 00 47 54 
11 88 30 95 28 

91 34 23 78 21 

w 28 50 13 92 
31 64 94 20 96 
28 36 82 58 
79 24 68 66 86 
45 13 42 65 29 

58 83 87 38 59 
52 62 30 79 92 
07 75 95 17 77 
27 49 37 0? 39 
11 16 17 85 76 


04 61 89 75 53 
32 60 46 W 75 
28 46 66 87 95 
55 78 17 65 14 

48 94 97 23 06 

51 90 81 39 80 
74 28 77 52 51 

07 08 28 50 46 

42 83 60 91 91 
74 81 97 81 42 

58 30 32 98 22 

73 51 59 04 00 
61 22 26 05 61 
85 49 65 75 60 
45 87 52 10 69 

74 21 96 40 49 
47 32 46 26 05 

75 23 76 20 47 

49 13 90 64 41 

08 78 98 81 56 

23 82 19 95 38 

05 53 35 21 39 
77 43 35 37 83 
53 07 57 18 39 

43 80 00 93 51 

23 76 80 61 56 
17 71 90 42 07 

82 06 76 34 00 
08 33 76 56 76 
17 98 54 89 1] 

28 82 53 57 93 
12 84 38 25 90 

83 82 45 26 92 
63 01 19 89 01 
88 32 58 08 5! 

I 7 97 41 50 77 
63 28 10 20 23 
69 57 21 37 98 

76 46 33 42 22 
26 76 08 36 37 

49 36 47 33 31 
12 36 91 86 01 
97 37 72 75 85 
85 13 03 25 52 
4 5 81 95 29 79 


31 22 30 84 20 

94 11 90 18 40 

77 76 22 07 91 

83 48 34 70 55 

94 54 13 74 08 

72 89 35 55 07 

65 34 46 74 15 

31 85 33 84 52 

08 00 74 54 49 

43 86 07 28 34 

93 17 49 39 72 

71 14 84 36 43 

62 32 71 84 23 

81 60 41 88 80 

85 64 44 72 77 

65 58 44 96 98 

40 03 03 74 38 

15 50 12 95 78 

03 85 65 45 52 

64 69 11 92 02 

04 71 36 69 94 

61 21 20 64 55 

92 30 15 04 98 

06 41 01 93 62 

31 02 47 31 67 

04 11 10 84 08 

95 95 44 99 53 

05 46 26 92 00 

96 29 99 08 36 

97 34 13 03 58 

28 97 66 62 52 

09 81 59 31 46 

54 13 05 51 60 

14 97 44 03 44 

43 66 77 08 83 

90 71 22 67 69 

08 81 64 74 49 

16 43 59 15 29 

26 65 59 08 02 

41 32 64 43 44 

96 24 04 36 42 

03 74 28 38 73 

51 97 23 78 67 

54 84 65 47 59 

65 13 00 48 60 
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08 35 86 99 10 
28 30 60 32 64 
53 84 08 62 33 
91 75 75 37 41 
89 41 59 26 94 


78 54 24 27 85 
81 33 31 05 91 
81 59 41 36 28 
61 61 36 22 69 
00 39 75 83 91 


13 66 15 88 73 
40 51 00 78 93 
51 21 59 02 90 
50 26 39 02 12 
12 60 71 76 46 


04 61 89 75 53 
32 60 46 04 75 
28 46 66 87 95 
55 78 17 65 14 
48 94 97 23 06 


31 21 30 84 20 
94 11 90 18 40 
77 76 22 07 91 
83 48 34 70 55 
94 54 13 74 08 


77 51 30 38 20 
19 50 23 71 74 
21 81 85 93 13 
51 47 46 64 99 
99 55 96 83 31 


86 83 42 99 01 
69 97 92 02 88 
93 27 88 17 57 
68 10 72 36 21 
62 53 52 41 70 


68 41 48 27 74 
55 21 02 97 73 
05 68 67 31 56 
94 04 99 13 45 

69 77 71 28 30 


51 90 81 39 80 
74 28 77 52 51 
07 03 28 50 46 
42 83 60 91 91 
74 81 97 81 42 


72 89 35 55 07 
65 34 46 74 15 
31 85 33 84 52 
08 00 74 54 49 
43 86 07 28 34 


33 71 34 80 07 
85 27 48 68 93 
84 13 38 96 40 
56 73 21 62 34 
65 13 85 68 06 


93 58 47 28 69 
II 30 32 92 70 
44 03 55 21 66 
17 39 59 61 31 
87 64 88 52 61 


51 92 66 47 21 
28 83 43 41 37 
73 85 27 00 91 
10 12 39 16 22 
34 31 36 58 61 


58 30 32 98 22 
73 51 59 04 00 
61 22 26 05 61 
85 49 65 75 60 
45 87 52 10 69 


93 17 49 39 72 
71 14 84 36 43 
62 32 71 84 23 
81 60 41 88 80 
85 64 44 72 77 


38 i 

00 

10 

21 

76 

37 

40 

29 

63 

97 

97 

12 

54 

03 

48 

21 

82 

64 

11 

34 

73 

13 

54 

27 

42 

07 

63 

87 

79 

29 

60 

52 

88 

34 

41 

83 

59 

63 

56 

55 

10 

85 

06 

27 

46 

39 

82 

09 

89 

52 

59 

58 

00 

64 

78 

38 

30 

80 

73 

41 

30 

69 

27 

06 

68 

65 

44 

39 

56 

39 

27 

26 

75 

02 

64 

91 

30 

70 

69 

91 

68 

43 

49 

46 

88 

4$ 

90 

81 

58 

77 

06 

91 

34 

51 

97 

10 

45 

51 

60 

19 

12 

88 

39 

73 

43 

21 

77 

83 

09 

76 

19 

52 

35 

93 

IS 

67 

24 

55 

26 

70 

60 

58 

44 

73 

77 

53 

85 

34 

13 

77 

24 

63 

73 

87 

36 

83 

03 

01 

24 

31 

16 

> 44 

> 42 

43 

34 

60 79 

' Ot 

81 

57 

512 


81 71 91 17 II 
01 30 47 75 86 
87 08 33 14 17 
47 14 33 40 72 
95 71 90 90 35 

03 06 II 80 72 
07 95 41 98 14 

06 95 89 29 83 
99 59 91 05 07 
43 62 26 31 47 

75 56 97 88 00 
23 79 34 87 63 
94 68 81 61 27 

18 28 82 74 37 
13 19 27 22 94 

19 07 22 42 10 
*4 47 31 36 22 
54 74 52 45 91 
42 67 27 86 01 
>4 21 03 37 12 

65 02 76 It 84 
38 80 73 69 61 
65 12 25 96 59 

35 58 31 65 63 

07 50 03 79 92 

36 06 69 48 50 
74 38 4! 9j 42 

38 99 22 28 15 
36 1J J9 90 7J 
*7 17 86 57 


71 60 29 29 37 
56 27 1J 00 86 
21 81 53 92 50 
64 63 88 59 02 

85 79 47 42 96 

96 20 74 41 56 
59 17 52 06 95 
05 12 80 97 19 
13 49 90 63 19 
64 42 18 08 14 

88 83 55 44 86 

90 82 29 70 22 
56 19 68 00 91 
49 63 22 40 41 
07 47 74 46 06 

36 69 95 37 28 
62 12 69 84 08 
35 70 00 47 54 
11 88 30 95 28 

91 34 23 78 21 

W 28 50 13 92 

31 64 94 20 96 

86 28 36 82 58 
79 24 68 66 86 
45 13 42 65 29 

38 83 87 38 59 

32 62 30 79 92 
07 75 95 17 77 
27 49 37 09 39 
11 16 17 85 76 


74 21 96 40 49 
47 32 46 26 05 

75 23 76 20 47 
49 13 90 64 41 
08 78 98 81 56 


23 82 19 95 38 
05 53 35 21 39 
77 43 35 37 83 
53 07 57 18 39 
43 80 00 93 51 

23 76 80 61 56 
17 71 90 42 07 

82 06 76 34 00 
08 33 76 56 76 
17 98 54 89 1| 

28 82 53 57 93 
12 84 38 25 90 

83 82 45 26 92 
63 01 19 89 01 
88 32 58 08 51 

17 97 41 50 77 
63 28 10 20 23 
69 57 21 37 98 
76 46 33 42 22 
26 76 08 36 37 


49 36 47 33 
12 36 91 86 
97 37 72 75 
85 13 03 25 
45 81 95 29 


31 

01 

85 

52 

79 


65 58 44 96 98 

40 03 03 74 38 

15 50 12 95 78 

03 85 65 45 52 

64 69 11 92 02 

04 71 36 69 94 

61 21 20 64 55 

92 30 15 04 98 

06 41 01 93 62 

31 02 47 31 67 

04 11 10 84 08 

95 95 44 99 53 

05 46 26 92 00 

96 29 99 08 36 

97 34 13 03 58 

28 97 66 62 52 

09 81 59 31 46 

34 13 05 51 60 

14 97 44 03 44 

43 66 77 08 83 

90 71 22 67 69 

08 81 64 74 49 

16 43 59 15 29 

26 65 59 08 02 

41 32 64 43 44 

96 24 04 36 42 

03 74 28 38 73 

51 97 23 78 67 

54 84 65 47 59 

65 13 00 48 60 


































table d percentile points of ^distributions 


55 60 65 70 75 


•158 .325 .510 .727 1.000 


142 .289 .445 .617 
.137 .277 .424 .584 
• 134 .271 .414 .569 
.132 .267 .408 .559 


-131 .265 .404 .553 
-130 .263 .402 .549 
•130 .262 .399 .546 
-129 .261 .398 .543 
.129 .260 .397 .542 


.129 .260 .396 .540 
•128 .259 .395 .539 
•128 .259 .394 .538 
• 128 .258 .393 .537 
.128 .258 .393 .536 


.128 .258 .392 .535 
.128 .257 .392 .534 
.127 .257 .392 .534 
.127 .257 .391 .533 
.127 .257 .391 .533 


.127 .257 .391 .532 
-127 .256 .390 .532 
.127 .256 .390 .532 
.127 .256 .390 .531 
.127 .256 .390 .531 


.127 .256 .390 .531 
.127 .256 .389 .531 
.127 .256 .389 .530 
.127 .256 .389 .530 
.127 .256 .389 .530 


.126 .255 .388 .529 
.126 .254 .387 .527 
.126 .254 .386 .526 
.126 .253 .385 .524 


Percentile* 


I 376 1.963 3.078 6.314 
1.061 1.386 1.886 2.920 
.978 1-250 1.638 2.353 
.941 1.190 1.533 2.132 
.920 1.156 1.476 2.015 


.711 

.706 


.697 

.695 

.694 

.692 

.691 


.690 

.689 

.688 

.688 

.687 


.677 

.674 


.906 U34 1.440 1.943 
.896 1.119 1.415 1 895 
.889 1.108 1.397 1.860 
.883 1.100 1.383 1.833 
.879 1 093 1 372 1-812 


.865 1-071 1-337 1.746 
.863 1.069 1.333 -740 
862 1.067 1.330 1.734 
.861 1-066 1.328 1-729 
.860 1 064 1.325 1.725 


.859 1.063 1.323 I ™ 
o«a 1 061 1-321 I- 71 ' 
all iiw 1.319 ' W 
S37 1 039 1.318 1.911 
ill I!®* 1-310 1.303 


S IS S 

HI 15 u" i* 

alt 1.033 1.310 1.693 


,,, 1.050 1.303 2.684 
848 1 046 1.296 1-671 
845 -0 41 JS 
M2 1.036 1333 ’ 6iS 


12 706 31.821 63.657 636.619 
4 303 6.965 9.925 31.598 
4.541 5.841 

3.747 4 604 
3.365 4.032 


3.182 

2.776 

2.571 


12.941 

8.610 

6.859 


2.447 

2.365 


3.143 3.707 

* ^ 2.998 3.499 

2.306 2.896 3.355 

2.262 2.821 3.250 

2 228 2.764 3.169 


.8,6 1.0S8 U63 1.396 
873 1.083 1-356 1-782 
.'8,0 . 039 1.350 I.™ 
.868 1 076 1-345 .761 
.866 1-074 1.341 1.753 


2 201 2.718 3.106 

2.179 2.681 3.055 
2 160 2.650 3.012 
2.624 2.977 
2.602 2.947 


2.145 

2.13! 


2.120 

2.110 


2-583 2.921 

. lfV 2567 2.898 
2 101 2.552 2.878 

2.093 2.539 2.861 

2.528 2.845 


2.086 


2.080 2.518 2.831 
2.074 2.508 2.819 

2.500 2.807 
2.492 2.797 


2.069 

2.064 


2.060 2.485 2.787 


2.479 2.779 
2.473 2.771 
2.467 2.763 
2.462 2.756 
2.042 2.457 2.750 


2.056 

2.052 

2.048 

2.045 


2.021 

2.000 


2.423 2.704 

2.390 2.660 
2.358 2.617 
2.326 2.576 


5.959 

5.405 

5.041 

4.781 

4.587 


4.437 

4JI8 

4.221 

4.140 

4.073 


4.015 

3.965 

3.922 

3.883 

3.850 


3.819 

3.792 

3.767 

3.745 

3.725 


3.707 

3.690 

3.674 

3.659 

3.646 


3.551 

3.460 

3.373 

3-291 


Percentile m the same distribution, »»'" 521 
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TABLE F (cent.) 
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TABLE H* DETERMINATION OF r„, FOR VARIOUS VALUES OF bcjad OR adlbc 
FROM A FOUR-FOLD CONTINGENCY TABLEf 


be ad 
— , or — - 
ad be 

Tut 

be ad 
Hd 0t bc 

n.1 

be ad 
— or — 
ad be 

r i*i 

be ad 
— or — 

ad be 

0 

1.000 

.26 

1.941-1.993 

.51 

4.068-4.205 

.76 

11.513-12.177 

.010 

1.013-1.039 

.27 

1.994-2.048 

.52 

4.206-4.351 

.77 

12.178-12.905 

.02 

1.040-1.066 

.28 

2.049-2.105 

.53 

4.352-4.503 

.78 

12.906-13.707 

.03 

1.067-1.093 

.29 

2.106-2.164 

.54 

4.504-4.662 

.79 

13.708-14.592 

.04 

1.094-1.122 

.30 

2.165-2.225 

.55 

4.663-4.830 

.80 

14.593-15.574 

.05 

1.123-1.151 

.31 

2.226-2.288 

.56 

4.831-5.007 

.81 

15.575-16.670 

.06 

1.152-1.180 

.32 

2.289-2.353 

.57 

5.008-5.192 

.82 

16.671-17.899 

.07 

1.181-1.211 

.33 

2.354-2.421 

.58 

5.193-5.388 

.83 

17.900-19.287 

.08 

1.212-1.242 

.34 

2.422-2.491 

.59 

5.389-5.595 

.84 

19.288-20.865 

.09 

1.243-1.275 

.35 

2.492-2.563 

1 .60 

5.596-5.813 

.85 

20.866-22.674 

.10 

1.276-1.308 

.36 

2.564-2.638 

.61 

5.814-6.043 

.86 

22.675-24.766 

.11 

1.309-1.342 

.37 

2.639-2.716 

.62 

6.044-6 288 

.87 

24.767-27.212 

.12 

1.343-1.377 

.38 

2.717-2.797 

.63 

6.289-6.547 

.88 

27.213-30.105 

.13 

1.378-1.413 

.39 

2.798-2.881 

.64 

6.548-6 822 

.89 

30.106-33.577 

.14 

1.414-1.450 

.40 

2.882-2.968 

.65 

6.823-7.115 

.90 

33.578-37.815 

.15 

1.451-1.488 

.41 

2.969-3.059 

.66 

7.116-7.428 

.91 

37.816-43.096 

.16 

1.489-1.528 

.42 

3.060-3.153 

.67 

7.429-7.761 

.92 

43.097-49.846 

.17 

1.529-1.568 

.43 

3.154-3.251 

.68 

7.762-8.117 

.93 

49.847-58.758 

.18 

1.569-1.610 

.44 

3.252-3.353 

.69 

8.118-8.499 

.94 

58.759-71.035 

.19 


.45 

3.354-3.460 

.70 

8.500-8.910 

.95 

71.036-88.964 

.20 


.46 

3.461-3.571 

.71 

8.911-9.351 

.96 

88.965-117.479 

.21 


.47 

3.572-3.687 

.72 

9.352-9 828 

.97 117.480-169.503 

.22 


.48 

3.688-3.808 

.73 

9.829-10.344 

.98 169.504-292.864 



.49 

3.809-3.935 

.74 

10.345-10.903 

.99 292.865-923.687 

.24 

1.839-1.888 

.50 

3.936-4.067 

.75 

10.904-11.512 

1.00 923.688- » 

.25 

J. 889-1. 940 




1 




* Values in this table were calculated by Thomas O. Maguire. 

t If bclad is greater than 1, the value of r ul is read directly from this table. If adlbc 
is greater than 1, the table is entered with adlbc and the value of r ut is negative. 
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TABLE G* 


FISHER'S Z-TRANSF0RMAT10N OF f. 




coefficient) 


TABLE J CONFIDENCE INTERVALS AROUND r ON p FOR n = 3,4 400 

{FIND UPPER-LIMIT VALUE ABOVE THE PRINCIPAL DIAGONAL. 
AND LOWER-LIMIT VALUE BELOW IT) 



TABLE 1 


CRITICA L VALUES Of THE CORRELATION COEFFICIENT* 

ff-a- 2 a — .10 - os 02 


1 ,988 

2 -900 

3 .80S 

4 .729 

5 .669 

6 .622 

7 .582 

8 .549 

9 .521 

10 .497 

11 .476 

12 .458 

13 .441 

14 .426 

15 .412 

16 .400 

17 .389 

18 .378 

19 .369 

20 .360 

21 .352 

22 .344 

23 .337 

24 .330 

25 .323 

26 .317 

27 .311 

28 .306 

29 .301 

30 .296 

35 .275 

40 .257 

45 .243 

50 .231 

60 .211 

70 .195 

80 .183 

90 .173 

ICO .164 


.997 .9995 

.950 .980 

.878 .934 

.811 .882 

.754 .833 

.707 .789 

.666 .750 

.632 .716 

.602 .685 

.576 .658 

.553 .634 

.532 .612 

.514 .592 

.497 .574 

.482 .558 

.468 .542 

.456 .528 

.444 .516 

.433 .503 

.423 .492 

.413 .482 

.404 .472 

.396 .462 

.388 .453 

.381 .445 

.374 .437 

.367 .430 

.361 .423 

.355 .416 

.349 .409 

.325 .381 

.304 .358 

.288 .338 

.273 .322 

•250 .295 

.232 .274 

.217 .256 

•205 .242 

•195 .230 


.9999 

.990 

.959 

.917 

.874 

.834 

.798 

.765 

.735 

.708 

.684 

.661 

.641 

.623 

.606 

.590 

.575 

.561 

.549 

.537 

.526 

.515 

.505 

.496 

.487 

.479 

.471 

.463 

.456 

.449 

.418 

.393 

.372 

.354 

.325 

.302 

.283 

.267 

.254 


,,, Title I !' re P r if 1 ’ e 9 from Tabic V .A. of Fisher & Yales, Statistical Methods for Research 
and pubiifhtts hed by ° lver and Boyd L,d - Ed,nbur gh, and by permission of the author 
, . * atsofufe rafut of an r from a sample of size n exceeds the tabled value for a 
11 byp° l bcsji that p = 0 may be rejected at the a-level of significance; 
bypothes'i u that P * 0. For example, a sample r of .59 with it = 20 leads 
to rejection of the hypothesis p = 0 at the .01 level of significance. 
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TABLE K ABSOLUTE VALUES OF THE CRITICAL VALUES OF SPEARMAN’S 
RANK CORRELATION COEFFICIENT, r„ FOR TESTING THE NULL 
HYPOTHESIS OF NO CORRELATION WITH A TWO-TAILED TEST* 


n 

« = .10 

at = .05 

o = .02 

« *= .01 

5 

0.900 

_ 

[ _ 

~1 _ 

6 

0.829 

0.886 

0.943 

1 — 

7 

0.714 

0.786 

0.893 



8 

0.643 

0.738 

1 0.833 

0.881 

9 

0.600 

0.683 

0.783 

0.833 

to 

0.564 

0.648 

0.745 

0.818 

li 

0.523 

0.623 

0.736 

0.794 

12 

0.497 

0.591 

0.703 

0 780 

13 

0.475 

0 566 

0.673 

0.745 

14 

0.457 

0.545 

0.646 

0.716 

15 

0.441 

0.525 

0.623 

0.689 

16 

0.425 

0.507 1 

0.601 

0.666 

17 

0.412 

0.490 

0.582 

1 0.645 

18 

0.399 

0.476 

0.564 

0.625 

19 

0.38$ 

0.462 

0.549 

0.608 

20 

0.377 

0.450 

0.534 

0.591 

21 

0.368 

0.438 

0.521 

0.576 

22 

0.359 

0.428 

0.508 

0.562 

23 

0.351 

0.418 

0.496 

0.549 

24 

0.343 

0.409 

0 485 

0.537 

25 

0.336 

0.400 

0.475 

0.526 

26 

0.329 

0.392 

0.465 

0.515 

27 

0.323 

0.385 

0.456 

0.505 

23 

0.317 

0.377 

0.448 1 

0.496 

29 

0.311 

0.370 

0.440 

0.487 

JJL 

0.305 

0.364 

0.432 

0.478 


Adapted from E. G. Ords, "Distributions of sums of squares of 
rank differences for small numbers of individuals,” Annals of 
Mathematical Statistics, 9 (1938), 133-48, and "The 3% signifi- 
cance levels for sums of squares of rank differences and a correction,” 
Annals of Mathematical Statistics, 20 (1949), 1 17-18, by permission 
of The Institute of Mathematical Statistics. 

* The tabled values are absolute values of the critical values for 
two-tailed tests. For example, the critical values of r, for n = 10 
and a = .10 are +0.564 and —0.564. 
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correlation coefficient 


TABLE J (cont.) 



SQUARES AND SQUARE ROOTS OF THE INTEGERS FROM 
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TABLE L 


PROBABILITIES ASSOCIATED WITH VALUES AS LARGE AS 
OBSERVED VALUES OF S IN KENDALL'S r WHEN THE NULL 
HYPOTHESIS OF NO CORRELATION IS TRUE 


Values of N 

5 ! 

Values of N 

« 

5 

8 

9 


6 

7 

10 

.625 

J92 

.545 

•540 

1 

.500 

.500 

-500 

J7S 

.40? 

.452 

.460 

3 

lJ* 0 

MS 

.431 

.167 

.242 

J60 

.381 

5 

,235 

.281 

.364 

.042 

.117 

.274 

.306 

7 

.136 

.191 

.300 


M2 

.199 

-233 

9 

.063 

.119 

J42 


.0033 

.135 

.179 

11 

.028 

.068 

.190 



.039 

.130 

13 

.0033 

.035 

.146 



.054 

.0X1 

15 

.0014 

.015 

.103 



.031 

.060 

17 


.0054 

.078 



.016 

.038 

19 


.0014 

.054 



.0071 

.022 

21 


.00020 

.036 



.0028 

.012 

23 



.023 



.00057 

.0063 

25 



.014 



.00019 

.0029 

27 



.0083 



.000025 

.0012 

29 



.0046 




.00043 

31 



.0023 




.00012 

33 



.0011 




.000025 

35 



.00047 




.0000028 

37 



.00018 





39 



.000058 





41 



.000015 





43 



.0000028 



1 


45 



.00000028 




S40 



16.7631 
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TABLE M (C0*O 



54* 



547 


TABLE M («"«> 




33*38 

s?saa ssss= 2?55i 


^^**2 Sif2J2 «r- otf > p 



M6 



TABLE N (cent.) 






TABLE O j-SCORE EQUIVALENTS OF SELECTED PERCENTILES IN THE 
UNIT NORMAL DISTRIBUTION* 


Percentile 

* 1 

Percentile 

z 

Percentile 



-3.719 

28 

-0.583 

74 

0 643 


—3.291 

30 

—0.524 

75 

0.675 

0.1 

-3.090 

32 

-0.468 

76 

0.706 

0.S 

—2.576 

34 

-0.412 

78 

0.772 

1 

-2.326 

36 

—0.358 

80 

0.842 

2 

-2.054 

38 

-0.305 

82 

0.915 

3 

-1.881 

40 

-0.253 

84 

0.994 

4 

-1.751 

42 

-0.202 

86 

1.0S0 

5 

-1.645 

44 

-0.151 

88 

1.175 

6 

—1.555 

46 

-0.100 

90 

1.282 

7 

-1.476 

48 

- 0 050 

91 

1.341 

8 

-1.405 

50 

0.000 

92 

1.405 

9 

-1.341 

52 

0.050 

93 

1.476 

10 

-1.282 

54 

0.100 

94 

1.555 

12 

-1.175 

56 

0.151 

95 

1.645 

14 

-1.0SO 

58 

0.202 

96 

1.751 

16 

-0.994 

60 

0.253 

97 

1.881 

18 

-0.915 

62 

0.305 

98 

2.054 

20 

-0.842 

64 

0.358 

99 

2.326 

22 

-0.772 

66 

0.412 

99.5 

2.576 

24 

-0.706 

68 

0.468 

99.9 

3.090 

25 

-0.675 

70 

0.524 

99.95 

3.291 

26 

-0 643 

72 

0.583 

99.99 

3.719 


* Values in this table were found by interpolating in Table B. 



TABLE P CONFIDENCE INTERVALS AROUND A SAMPLE PROPORTION, p, 
ON A POPULATION PROPORTION. P 

_ From E. S. Pearson and C. J. Clopper, “The use of confidence intervals or fiducial 
limits illustrated in the case of the binomial,” Biomeltika , 26 (1934), 404. Reproduced by 
permission of the Biometrika Trustees. 


Confidence coefficient 0 99, a = 0 01 




appendix 


ANSWERS 
TO PROBLEMS 
AND EXERCISES 


CHAPTER 1 


1. a. nominal b. ordinaf c. ratio d. nominal 

2. a. nominal b. ordinal 

3. a, 6 yr 4.5 mos. -6 yi 5.5 mos. 

b. 2 Jb 12.75 02-2 Jb 13.25 oz 

c. 5342.5O-S343.50 

4. a. X 12 b. Aii c - 

5. a. 19 b. 10 c. 11 d. 13 

6. a. 60 b. 30.5 c. 5 d. 15 2 =* 225 

7. a. 26 b. 15 c. 7 d. 9 + 4 + 1 -f 25 = 39 

8. a. ** + *1 + + A"* b. X t + X s + X, c. (*, + X t + A'j) 5 

9. a. 3 £ J-, b. (S- r <J '• XW + V 

d. jr X ( (T, + I) -2 (JT« + Jr,) 
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556 APPENDIX B 


CHAPTER 3 

2. P 50 = 108.41 


CHAPTER 4 

1 . Mean - 2.56; median = Z8; mode = 3.05. 

2. Mean = 3.06; median = 3.3. 

3. Mean = 14.34; median = 14.45. 

4. Mean = 3(14.34) =43.02; median = 43.35. 

5. Mean *= 8.86; median = 9.65. 

6. Mean = 13.3; combined median can't be determined from a knowledge of 
individual medians and it's only. 

7. At point D. 


CHAPTER 5 


1. Inclusive range = 21; variance, i* 
mean deviation *= 4.22. 


2 - Q 


15. 


32.34; standard deviation, s = 5.68; 


3. si =» 0 039. i x = 0.198. 

4. a. positively skewed b. positively skewed 
c. positively skewed d. negatively skewed 

5. “greater than”; variance of combined groups is 66.9. 

6. b. 0.73 c. 2.60 d. -2.60 

7. a. Student B. b. Test l: z A = —.64, z B = +.41. 

Test 2: z A = 2.49, z B = -.07. 

c. Student A. 




t. (X, - X? l 


- Xf = - (r, - i) s t = « — l. 


1. a. .1587 b. .9772 c. .0505 
g. .8664 

2. a. .2420 b. .2420 c, .0317 

3. a. 0.00 b. +1.00 c. —1.00 
g. +1.28 

4. a. 50 b. 89 c. 6 

5. Variance of Y for X 


d. .0250 e. .4987 f. .6915 
d. .3945 

d. +1.645 e. +2.58 f. -2.58 


d. 38 e. 99 
1.50 is also 0.50. 



APPENDIX B 5S7 


1. If z, - z„ then r„ - gjrfl. - 1) - (Jz’J/fr - 1). In Prob. 8 i 
Chapter 5, it was shown that 2 z * = ” “ *• Hence 


2. Brown and Smith will obtain the same value for the correlation coefficient, 
since height in inches is 12 times height in feet. A linear transformation of X 
and/or Y will not change the value of the correlation coefficient. 

3. a. positive b. negative c. positive d. positive e. negative 

4. Since r„ can’t exceed +1, r„ = (j*,)/(v*) <1. We know that s g « 5 and 
s y >= 4; hence (f*„)/( 5 • 4) < 1, which implies that s zv < 20. 

5. X is more closely linearly related to Z than to Y. 

6. The researcher is inferring a causal relationship solely from correlational 
evidence. He has no justification for doing so. It may well be the case— and 
probably is so — that teachers’ salaries and the “drop-out rate” are both a 
function of the social and economic status of the community and that increasing 
teachers’ salaries in a given school would not bring about a decrease in the 
“drop-out rate.” 

7. a. r = -.04. 

c. X and Y are curvilinearly related. There is almost no linear relationship 
between them as indicated by the Pearson product-moment correlation 
coefficient of —.04. 



2. As the line moves from X = 2 to X = 3, it drops one unit, from Y = 2 to 
Y ~ 1 . Hence the slope, b lt of the line is —1. If the line falls one unit on the 
Yaxis for each unit it moves to the right on the X axis, then the fine must rise 
one unit for each unit it moves to the left on the X axis. Thus, as X moves from 
2 to 0, the line rises 2 units on the Y axis from 2 to 4. Hence, b 0 = 4. 

3. a. 2.93 b. 1.80 c. 2.06 d. 3.24 
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4. b 0 = 46.76, 6, = .194. Thus t = .194 A” + 46.76. f for an X of 36 is 53.16. 

5. j, = 7.16. 

7. r, = 99.37, Y t = 93.63. 

8. No. 

9. Smith's regression line will have the equation f = 5X -f 250. 

10. Transforming X into cX 4- d divides 6, by c. i.e., for the transformed X's, the 
slope of the regression line is bjc. Since b a = F - fc,JT , the Y intercept for 
the transformed data, denoted by b*, is r 

b s - -t.r 

c c if 

CHAPTER 9 

1. a. point biserial b. Spearman's rho c. rank biserial 
d. Pearson's r e. Phi coefficient f. rank biserial 

2. a. ^ - 0.08. b. * - 0.35. 

3. r 9i - 0.25. 

4. r M .«0.47. , 

5. r M , - -0.75. 

6. r = 0.60. 

7. r ul = 0.50. 

8. a. r, - 0.87. b . r = 0.76. 

9. - 0.36. 

10. a. Partial r equals 0.32. b. R z , , = 0 38 

11. 4,-0. 4,- .32, 4,-. .12. K, 1 04 | ‘ 

12. No. 

chapter 10 


1. 1 PH) - 13/52 . 1/4. b 

A PM US). 1.(4, + P(s ,'_ 1 ’ A PM ns) = 0/52 . 0. 

16/52 « 4/13. 1 n B, _ (13/52, + (4/52 , _ (| , 52 , _ 

2 ' £ '!“!? problems. 

« a child ha emotional problems. 

PM) - m “ S raP '";,' a " d Pihblc™. 

'•M-P, - .03 i ,i 6 :° 6 oi 5 /^ B ’“-015. 

3 ’ J ?. 4 b - 210 c. 20 

4. No. ” ' ,9 ' 200,,,. 
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5. a. 20 b. I C. 1 

6- = 1287. 

7. I + 4 + 6 + 4 + 1 = 16 ; 2 ‘ = 16 . 

8 - 5(5°) - J' 252 ) - |26 - 

9. P(X = 0) - ^(l/2)"(l/2)“ , (1/2)“. 

nx - 1) - ('“Jiwd/a' - io(i/2)“. 

HX - 2) - (' 2 °j(l/2) ! (l/2)' - 45(1/2)“. 

P(X . 3) - ^<l/2) 3 4 5 (l/2)’ - 120(1/2)“. 

^(AT - 0 or 1 or 2 or 3) = (1 + 10 + 45 + 120)(l/2) 10 = 176(1/2)*°. 
^(AT =* 4 or more) = 1 - 176(1/2)*° = 1 - 176/1024 = 1 - .17 = .83. 
10. Expected 

AT frequency 

0 81 

1 108 

2 54 

3 12 

4 1 
256 

12. (1/2) 5 - 1/32. 

13. a. .79 b. .006 c. (.691 5) 3 = .331. 

14. E(X) = 2\. 

15. a. E(X) = 2 b. o'- = !. 

CHAPTER 11 


1. a. 0.10 b. 2.09 



c. 1.28 
15.507 
8 


d. 1.29 
1.94. 


e. 0.87 


f. 3.65 


g. 37.70 


3. Probability equals .02. 

4. . 05 F 1iS = 1 = 1/6.04 = 0.166. 

5. Positively skewed. 
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6. Median is below n. 

7. .ts7.lv = 31.41. Setting (X — 20)/6.32 equal to 1.64, the 95th percentile of the 
unit normal distribution, gives a value of X equal to 30.36. 

8. = E(0 - £(F,.0 = nj(n - 2). 

CHAPTER 12 


2. a. same b. different c. same d. same e. different 

3. The probability that j; exceeds o\ is less than .50. 

e| a 

4 ' 25io 5 j00 

b. 12.5 3.54 

c. 6.25 2.50 

d. 3.13 1.77 

e. 1.56 1.25 

f. 0.781 0.883 

g. 0.050 0.224 

h. 0.025 0.158 

5. A sample of size 40 would have to be taken. 

6. a. .68 b. .90 c. .99 d. .50 e. .997 

7. The 95% confidence interval is 105.97 to 107.53. 

8. 95% 99% 


a- (-.01, .65) (-.14. .71) 

b. (-.89, -.12) (-.93, .09) 

c. (.03, .25) (-.01. .28) 

9. Tic statistic m 6 has the smaller variance error; hove ever 
population mean, it is biased. 


as an estimator of the 


CHAPTER 13 


4 A lh ' "^ ion ° ( "• *to> it is tree. 

c- r ^- 

t. Thepowerofatest 1 P r, ^ blIlt y °f committing a type I error. 

tr The critical « r «3«So S i/,wh« It is false. 

investigator will rcieci w r v 3 ues 3 “fnple statistic for which the 
S* »rll reject H, ,f hr, sample y „. Ids „„ „„„ vaIlK 
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2. In none of the cases can it be concluded with certainty that H 0 : p = 0 is false. 

3. a. type I error b. no error c. no error d. type II error 

4. A type II error cannot be committed if H 0 was rejected since the definition of a 
type If error is that it is the acceptance of a false ff 0 . 

5. Probability of a type I error is approximately .16. 

6 . a. Critical region is split between both tails, 
b. upper tail c. lower tail 

7. a. The probability of a type I error is the same for both researchers; a = .05. 

b. Rowe has a larger probability of committing a type II error when p — .10. 

c. null 

8 . a. .10 b. .95 c. .95 d. above .99 

9. Probability of a type II error when p = .10 is greater than .99; hence, the power 
of the test of H 0 : p ■= 0 against the alternative p = .10 is less than , 01 . 

If there is believed to be a high probability that p surely deviates no more from 0 
than 4-.10 or —.10 if at all, then the investigator is wasting his time; he is 
almost certain to end up accepting H„. He should either increase ct, which would 
have the effect of moving the critical values of r closer to zero, or he should 
increase n. 

CHAPTER 14 

—3.54 

1 . t •= — = -.34, which is nor significant at the .01 level since -> 9 s r 40 » 2.7 04. 

2. / = (7.90/3.10) = 2.55. Since » 75 r 1B = 2.101, the value of / is significant at the 
.05 level. 

3. Z r ± 2.58 j'/n — 3 = (—.221, .605), which equals (—.217, .540) when con- 
verted back to the scale of r rv . 

4. a. t — r^(n — 2)/(I — r 2 ) = 1.033. Since .* 75/73 is approximately equal to 

2.00, the value of r of .12 is nonsignificantly different from zero at the .05 
level. (Corroborate this result by use of Table I in Appendix A.) 

b z = llLJJE-0.81. Since * 75 z = 1.96, the difference between the two 

v'l{72 + t/St 

r’s is not significant at the. 05 level. (Note that the numerator of z is in 
terms of Z r ’s and not r’s. It just happens that in this instance the two Z T \ s 
and r’s are identical to two decimal places.) 

5. z = (3.178/1.128) = 2.82. Since 2.82 exceeds * Js z = 1.96, r„ and r„ are 
significantly different. The results indicate that the Miller Analogies Test (X) 
is more highly related to a vocabulary test ( Y) than to a nonverbal reasoning 
test (Z). 

6 . z — -4.72, which is significant far beyond the .01 level. 

7 . z = Vn <f> = V69 C24) = ^3.97 => 1.99. Since -is z = 1.64 and 05 z = —1.64, 
reject the null hypothesis at the .10 level. 
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= 0.47, which is nonsignificant at the .01 


level. 

9 . The obtained value of j? is 24.69. The 95 th percentile in the x 1 distribution with 
6 degrees of freedom is 12.59. Hence, reject the null hypothecs of “no 
association” between the two factors of classification at the .05 level- 

10. The obtained value of x 1 is approximately 2.50. The 90th and 95th percentiles 
in the x i distribution with 1 degree of freedom are 2.71 and 3.84 respectively; 
hence the obtained j* is nonsignificant at both the .10 and .05 levels- 

11. The obtained value of x 1 is 13.27. The 99th percentile in the x z distribution with 
4 degrees of freedom is 13.28. While the obtained value of x z is strictly non* 
significant at the .01 level, one should have few compunctions about announc- 
ing significance at the .02 level. 

12. Since zero lies between the limits of the confidence interval, it is true that 


(^.i — -^.t) "" .mttffl'i-X., < 0 < O’.v “ X.i) + ,mf ifcfJC.i-I.i* 

By subtracting — X A ) from all three sides of the above inequality and then 
dividing all sides of the inequality by — , (which reverses the direction 

of the inequality), one obtains 

which shows that the t-statistic for testing ff 0 : - >i t lies between the critical 
values for testing the null hypothesis at the .05 level. 


CHAPTER 15 


1. a. d/» - 1 ,«//„ - 6. b. df h « 4 ,df v = 5. 

c. df„ ^ 2, df m - 10. d. df b =l,df K ~ 7. 

2 - a ‘ .» F i-u “ 10 -04. b. .*0^4. »o = 2.14. c. >5 Fj jj = 3.68. 

3. a. Yes, this statement is equivalent to the statement of the alternative hypoth- 
esis, «i. 

b. No. c. Yes. d. No. 


4. MS n - 20.31 ; AfS m - 6.61; F = 3.07. 

5. MS,, - 190.68; MS W = 221.56; F ~ 0 86 
significant at the .01 level. 


Since M F i4 „ = 5.49, F is not 


6. MS, - 312.05; Ms. - 4S.20; F - 6.5; (2.55)= _ 6 50 

7. ANOVA table: 


Source of car. df MS 


Between groups 3 35.735 

Within groups eg 5 ' 49 

Since - nF,M " 2 ' 74 - I{ * » rejected at the .05 level. 


6.51 
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8. ANOVA table: 

Source of var. df MS F 

Between groups 2 233.79 8.34 

Within groups 21 28.02 

Since , # 5 /^. 2 i => 3.47, H 0 is rejected at the .05 level. 

9. ANOVA table: 

Source of var. df MS F 

Between groups 2 9.733 4.38 

Within groups 27 2.222 

An /"-ratio of 4.38 exceeds the 95lh percent! le in F z S7t but fails to exceed the 
99th percentile in that distribution. 

10. ANOVA table: 

Source of var. df MS F 

Between groups 3 145.61 35.43 

Within groups 13 4.11 

Reject H 0 at the .01 level. 

H. The value of is ^2 = 1.414; n x - J — I - 4; n 2 = A' - / <= 20; and * - 
.05. From Table N in Appendix A (the power of the /"-test) we see that the 
power, 1 — /?, is approximately . 60 . 

12. Since estimates a 2 jn, nsf, estimates a 2 . Hence, the estimate of a 2 j$ 
20(2.40) "=■ 48.0, 

"HUb- A ) 2 

13. E{MS b ) = <r 4 + l -j_f 20 + 250 = 270. 

CHAPTER 16 

I. a. . 93 ? 4 . 3 Q *= 3.845. b. ,»s9«.i2o = 4.872. c. . 90710,10 *= 4.913. 

d. ,999i2. « = 9.485. e. .» 5 9 2 .« 0 ■=* 2.829. 

2. a. Probability equals .05; note the value of 95 y S M . 

b. One percent would exceed 5.048; note the value of . 9995 . 30 - 

3. Any pair of means must differ by at least Mqt.^MSJn = 8,32 to be judged 
significantly different at the .05 level. By this criterion, the following differences 
between means are statistically significant: 

X A - X A = 9.32 Xjt - X A = 16.94 

X A - JTj - 12.17 X A - Xji « 17.45 

X A - X ml = 20.41 X A - X A = 11. 09 

X A - X s = 20.92 X A — X A = J J.60 

X A - X, * = 8.70 X A - X A «• 8.75 
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4. A/5, = 5.49, n — 18, 7 = 4, n q tM = 3.72. The value of VAfSJn = 
2.05. If any two means differ by more than 2.05 units, then they will be judged 
significantly different at the .05 level by the F-method. By this criterion, the 
following pairs of means are significantly different: 

X A - X A = 3.33, X , - X A = 2.21 , - X', = 2.38. 

We see that the “control group,” group 4, differs from the three experimental 
groups that cannot be judged to differ among themselves. 

5. A/S, = 102.05, n - 20, 7 = 3, ^ 3 . 5 , = 4.282. The value oT A9 q 3wS i v MSjn 
is 9.68. Any pairs of means differing by at least 9.68 units will be judged 
significantly different at the .01 level. The following pairs of means are 
significantly different: 

JF j - JPj -> -13.70 X x - X 3 = -15.75. 

6. a. ANOVA table: 

Source of vnr. df MS F 

Between groups 3 276.731 15.94 

Within groups 24 17.365 

Since the obtained F-ratio exceeds _, 5 F, I4 - 3.01 , the null hypothesis of no 
differences among the four population means is rejected at the .05 level, 
b. A/5, - 17.37, 7=4. it - 7. K F t M = 3.01, « 3.901. For the 

two means to differ significantly by the T-method, the absolute value of the 
difference between them must exceed , K tf t .u v MSJn = 6.14. 

For two means to differ significantly by the S-mcthod, the absolute 
valu e of the difference between them must exceed V(7 — l)*jF, tl 
y v'A /5 e (2//i) = 6.69. The results of the multiple comparisons appear 
below: 

Is difference significant Is difference significant 
Comparison by T-method ? by S-method? 

X A - X* = -2-57 
Xi - X. 3 ■= -9.16 
X.i - X. t => -13.84 
X. t -X t ~ -6.59 
X. t - X A = -11.27 
X A ~X A ~ -4 68 

c. In the contrast (/* t -r 

<■1 - ~l. e 
Thus the value of 6- ij 



The value of f is 

V - <21.25 t- 27.84), f2 - <18.65 - 32.521/2 - -1.05. 
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The 95 % confidence interval is 

V ± V U- - — 1.05 ± 4.73 = (-5.78, 3.68). 

7. a. ANOVA table: 

Source of car. df MS F 


Between groups 4 695.06 10.57 

Within groups 95 65.75 

The obtained value of F is significant at the .05 level, 
b. A fS„ ■= 65.75, J =» 5, n = 20, == 3.94. Any difference between two 

means which is larger in ahsoJute value than V MSjn = 7.13 is 

significant at the .05 Jevel by the T-method. The significant differences are 
identified by an asterisk in the following table of differences: 

Differences between means 



Arc 

5F 

SO 

AY 

5K 

50 

at 

-1.2S 

—8.40* 

8.00* 

-7.15* 

9.25* 

16.40* 


AO 

1.70 

2.95 

10.10* 

-6.30 


* Difference is significant at the .05 level. 


c. The 95% confidence interval on — ( «* + ... + /' s )/4 is (-6.35, 6.37). 

: 17 

J. ANOVA table: 


Source of oar. 

4 

ss 

MS 

F 

Factor A 

4 

64.26 

16.07 

1.70 

Factor B 

5 

46.85 

9.37 

0.99 

A X B 

20 

1164.05 

58.20 

6.15 

Within 

120 

1136.53 

9.47 


Total 

149 

2411.69 



Critical value 

Obtained value 

Decision 

Factor A „F t lso 

*= 3.48 

1.70 

Do not reject H 0 

Factor B 

= 3J7 

0.99 

Do not reject Jf 0 

A x B .99^20. ISO 

= 2.03 

6.15 

Reject H 0 


2. a. Yes b. Yes c. No 

3. Females studying French by the aural-oral method should score 8 points above 
the general mean on the language mastery test: 

/‘u = n + a, + ft + a =//-f6 + 2+ 0=/< + 8. 
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5. Case a shows a disordinal interaction. Case b shows an ordinal interaction. 
In case c there is no interaction since the two lines or the graph are parallel. 

6. When “treatments’* is placed on the abscissa, the ordinal interaction graphed 
in case bo f 5 becomes disordinal. 


7. ANOVA table: 


Source of car. 

df 

MS 

F 

Type of question 

\ 

1838 

0.91 

Position of question 

1 

532.04 

26.43 

Interaction 

1 

77.04 

3.83 

Within cells 

20 

20.13 



The critical values against which each of the three /"-ratios is compared are all 
the same: M F, J0 = 2.97. Hence, vre see that the /“-tests for “Position of 
Question" and •‘Interaction” result in rejection of the null hypothesis. 

8. ANOVA table: 


Source of tar. 

4 

MS 

F 

Intelligence 

1 

92.04 

3.56 

Method of instruction 

2 

233.79 

9.03 

Interaction 

2 

1539 

0.59 

Within cells 

18 

25.88 


Jijpo/bejit rested Critical value 

Obtained F 

Decision 

2«*-0 mF im 

— 4.41 

3.56 

Do not reject //„ 

/ V ^-0 M F tts 

-3.55 

9.03 

Reject //„ 

'V 22 <1=0 .*^1, 

— 3.55 

0.59 

Do not reject 

ANOVA table: 




Source of car. 

df 

MS 

F 

Reading achievement 

2 

28,103.97 

158.86* 

Reading difficulty 

1 

710.81 

4.13* 

Interaction 

2 

48.02 

0.27 

Within cells 

234 

176.91 


* Significant at the .05 level. 




10. ANOVA table: 


Source of tar. 


Se* 

Reading readme** activity 

Interaction 

Within cells 


* Significant at the .05 level. 


I 


24 


MS 


1.028 

0.083 

0.M2 

0.161 


F 


6JI* 

0.45 

0.26 
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11. a. proportional b. disproportional 

c. proportional d. disproportional 

12. If two observations are randomly discarded from the cell at the intersection of 
the third row and third column, the cell frequencies would be proportional. 

13. The layout of cell means is as follows: 


Factor B 

9.675 10.050 

Factor A 10.242 10.512 

9.011 9.883 

From these data, the following sums of squares are found: 

SS' A = 0.868, SS* - 0.384, SS' AB = 0.104. 

From the original layout of 40 observations, MS W is found to be equal to 

0.7830. 

The value of c in Eq. (17.15) is 0.1604. Hence, MS^ = 0.126. The re- 
mainder of the analysis is reported in the ANOVA table below: 


Source of var. 

v 

MS' 

F 

Socio-economic status (A) 

2 

0.434 

3.44 

Type of school (B) 

1 

0.384 

3.05 

A x B 

2 

0.052 

0.41 

Within cells 

34 

0.126 



The critical value of F for testing the null hypothesis for factor B at the .05 
level is about SS F, tM = 4.12. Hence, the hypothesis of no difference between 
types of school cannot be rejected. 

For testing factor A and the interaction of factors A and B at the .05 level, 
a critical value of 95 F 2 34 = 3.27 is used. The null hypothesis for factor A can 
be rejected, but no evidence exists for rejecting the hypothesis of no interaction. 

1 18 

1. a. 61 = .042. 

b. Based upon our estimates, the variance of ratings for children being rated 
by the same judge is about 250 times as large as the variance of judges. 

c. The 95 % confidence interval on <x*/o 3 is (—.03, .13). Since it is impossible for 
ct 2 /o 2 to be negative we could set the interval from 0 to .13. 

2. a. MS a = 237.25; MS„ = 43.92. 

<?* = (MS a -MSJI12 = 16.11. 

6 2 = 43.92. 

b. The 95% confidence interval on «*/«>* is (.16, 2.66). 

3. 6; = (A fS A - MS AB )/(nJ) 
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4. AN OVA table: 

Source of tar. df 



Persons 29 1 83.84 

Interaction 29 108.52 

The critical value Tor the Piece is - 4.18. Obviously the obtained F- 

ratio is far below this critical value. Hence, the hypothesis or no difference 
between Verbal and Performance IQ can not be rejected. 

5. ANOVA table: 

Source of tar. df MS F 

Between subtests 3 23.41 3.22 

Persons 11 19.07 

Interaction 33 7.26 

Set a - .01. Apparent test: The critical value for the F-ratio with the 
“apparent test" is — 4.40. Since the F-ratio of 3.22 does not exceed 

the critical value, the null hypothesis of no differences among population means 
on the Tout types of test is not rejected at the .01 level. 

6. a. 20 x 32 x 8 » 5120. 

b. "Raters” and "ratees” are essentially random-effects factors, because 
20/10,000 and 32/20,000 are negligibly different from 0. “Traits” is a fixed- 
effects factor, because 8/8 * 1. This is a fully crossed design; there is no 
nesting. 

c. There are seven sources of variation in these data: three main effects, three 
two-factor interactions, and one three-factor interaction (untestable residual, 
here). The sources of variation are as follows: between raters, between 
ratees, between traits, raters x ratees, raters x traits, ratees X traits, and 
raters x ratees x traits. 

d. Let raters be factor 1 , ratees factor 2, and traits factor 3. Then 


EfMSj) = o* -r 8o*j 4- 256nJ 

£(AfSj) = a* + 8 a* 2 -f 160*7 \ 

E(MS t ) = + 32of s -f 20a*, 4- 640a* 

E(.MS n ) = n- 4 

E<MS„) = a* 4- a '- u 4. 32<r* s 

E(AfS„) =* 0* 4- <r* a 4-20 o| 4 

E(MS ia ) = 
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P MSs 

Ut MSj 3 + MS :3 - MS ia ' 
where / 2 is approximately 

(MS a + MS,, - MS,,,)* 
MSf, MS ‘ , MS* 2 , 

133 + 217 +-JT2T 


(see Brownlee, 1965, p. 301). 


^589 4123 


£ 


ms 12 
A fS m ' 


there being no exactly appropriate F for testing the interaction of raters with 
ratees. 


_ MS a 

A «,„• 


^217.4123 


MS» 

MS, *3 


A\ = (MS 1 - A/S 12 )/256. 

<5 1 = (MS 2 - A/S X2 )/I60. 

&l = (MS, - “ A/5 23 + MS 123 )I640. 

> (A/y i2 - A/5 1S3 )/8. 

«?3 - (M5» - A/S 12 ,)/32. 

<5| 3 =* (Aff,, - MS ia )f20. 

is not estimable from these data, because 
there is no MS t whose E(MS g ) = o 2 . 

. a. The two nested factors are raters (nested within political party of rater) and 
ratees (nested within political party of ratee). It happens here that the 
political parties are the same for raters and ratees. This is a doubly nested 
design. 

b. Yes. Raters cannot cross political party of rater, and ratees cannot cross 
political party of ratee, but they can cross everything else. See Stanley 
(1961a) for further details. 

c. "Political party of rater” crosses everything except "rater.” 

d. Rater and ratee are likely to be considered random-effects factors. 

e. As you probably surmised, Democrat raters tended to rate Democrat ratees 
much higher than they rated Republican ratees, and Republican raters 
tended to rate Republican ratees much higher than they rated Democrat 
ratees. Thus raters tended to prefer the ratees who were prominent in their 
own party. 

f. The interaction of party of rater with party of ratee with trait rated suggests 
that raters' bias in favor of their own party was not uniform across traits 
Raters tended to rate ratees of their own party relatively higher on some 
traits than on others, and similarly for ratees of the other party. 
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VERIFICATION OF 

STANDARD SOLUTION TO 
LEAST-SQUARES CRITERION 


Sec page 1 49 first. We begin here with 

f , - btf, -V b 0 - bj, + {Y.~ b t X) - - X) + 7.. 

Then consider adding the constant c to and the constant d to b 0 > where c 
and d may be any real numbers, negative, zero, or positive: 

y; - 0>. + c)X, + If y. - (),?.) + fl = MX, - X.) + ?.l + (cX, + i 0- 

We need to show that 2 ( y . - ?<)’ £ 2 tV, - J 1 ?*. By substitutes 

for ?, and F' their values shown above, we secure the following inequality 
to be proved : 

i in - MX, - Z) - ?.!’ s i j[y, - MX, - X) - F) - (cX, + «!’• 

-1 i-l 

After squaring and summing, we have 

ify.-Mx,- x.)- rj* siiy,-b,(x,- X.)- 7.1= 

+ i(cx, + d) ! - 2 Xkv, - ?■) - b,(X, - X.)](ttX, + rf)- 
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Subtract J [ y , - 4 l( x, - Jr.) - Fj» from both sides and obtain 

o sj (cX, + df - 2 ( i [(r, - F) - fc,( A-, - F)](cX, + d). 

XW + <0* cannot be negative. To complete the proof we shall show in 
a straightforward manner that, when b t — s^/s 2 . 


I l(y, - ?■) - b,(X, - A)/(cA, + d) = 0. 

First, express cjr, i J as (cX t -f d — cJTJ 4* cA, which is equivalent 
to c(A, - A.) + (cA. + d). Then 

2 KF. - F.) - 6,(A. - F)][c(A, - A) + (cA. + d)J 

- c i (A, - A)(y, - F) + (c a + d>i(Y t - rj-vift-F)’ 

- h.CcA + d)i(A, - A.) 

= c(n - l)s„ + (c A. + d) 0 - S -fc(n - l)s* - i>,(cJ + i ) 0 
== c(n — 1)5*,, 4- 0 — c(n — l)s iy — 0 = 0. 


Note that this is equivalent to saying, as in Sec. 9.4, that the covariance of 
the X ( ’s — or of the linear transformation of the X/s, which is cX t + d here — 
with the discrepancies between the actual y/s and the y/s predicted from 
the Xi's via b x and b 0 is always exactly zero. Thus the correlation between 
initial status, X { or cX t + d, and the Y { — ? t is zero. 

Therefore, 

0 <: i (r-A, 4 d)\ 

i=l 

which is a correct statement. This completes the proof. 

Thus, if one were to use as his regression-slope coefficient b, + c rather 
than bi — s xy /sl, and as his intercept b 0 + d rather than b 0 = Y. — b t X., he 
would increase the sum of the squared errors of estimate by the nonnegative 

amount (cX s + d) a . 

This proves the contention, but it does not show how b t and b„ were 
derived {via the differential calculus) in the first place. That derivation appears 
in most textbooks of basic mathematical statistics. 
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Kendall’s tau. 176-79,316-17 
Kolmogorov-Smirnov test. 506 

KSrSlTswi 1 '" 0 " ” 2 ‘ m 

of the normal curve, 98 

Law of large numbers, 202 
Least-squares. 138. 149-50, 570-71 
estimates, 342, 403-6 
Leptokurtic curves, 91,98 
Leveling, 490. 492 
on interval-scale, 494-95 
on ratio-scale. 494-95 
variable, 494-95 
Level or significance, 279-83 
Lcvene’s test, 506 
Limit: 

lower real, 76 
upper real, 76 

Linear regression equation, 187 
Linear relationships, 118, 124 
Lorlt r Tk nsformat,0n ' *19-20 

'"“w™ T«t. 335 
8 Thorndike Picture Reasoning Test, 335 

M a "‘ fcst Anx'ety Scale, 123 
Maihpi^ I . d,S, i ribu, ' ons * sha Pes of, 124-25 
Mathematical statistics T ° f Amer ' ca ’ 

Mean, 60-67 
arithmetic. 60. 65, 341 
calculation of, 61-63 
° on C °7! bl! ’ ed groups. 65-66 
contra harmonic. 72 
defined. 60 
deviation, 79. 85-86 
geometric, 7| 
grand, 342 
harmonic, 71-72 
defined, 72 
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Mean {coat.) 
interpretation of. 67 
of a population, 293-95 
null hypothesis in terms of, 345 
properties of, 64-65 
of ratios, 72 
sample, 60 

standard error of, 248 
unweighted analysis, 440 
variance error of, 248 
Mean square (MS). 347, 365 
determining the expectations of, 479-81 
distribution of, 424-28 
expected values of, 421-24 
null hypotheses and, 423 
two-factor, 418 
Measurement. 7-16 
of central tendency, 57-74 
choice of, 68-71 
geometric mean, 71 
harmonic mean, 71-72 
the mean, 60-67 
the median, 59-60, 65-67 
the mode. 58-59, 65-67 
ratio of means, 72 
defined, 7 

nonlinear relationships between variables, 

150-52 

of relationships, 109-32, 155-94 
computational formula forr„, 113-14 
differences of variables, 127-29 
effect on r„ of transforming scores, 119- 
20 

illustration of the calculation of r„, 114- 
16 

interpretation and use of r„ 175-76 
interpreting correlation coefficients, 121- 
27 

interval (ratio), 1 56-57 
Kendall’s tau, 176-79 
multiple correlation, 186-91 
multiple prediction, 186-91 
nommal-djchoiomous, 156 
ordinal, 8-10, 156 
part correlation, 182-86 
partial correlation, 182-86 
Pearson product-moment correlation 
coefficient. 109-13, 122, 155, 158, 159, 
161, 163, 166, 186 
properties of 161-62 
range of values of r^, 1 17-19 
Spearman rank-correlation coefficient, 
172-75, 176, 316 
tied ranks problem, 1 75 
variance of sums, 127-29 
scales of, 8-14 
characteristics, 12 
examples of, 12 
interval, 10-1 1 
nominal, 8 
ordinal, 8-10, 156 
ratio, 11-14 

of variability, 14-16, 75-94 
kurtosis, 90-92 
mean deviation, 79, 85-86 
the range, 76-78 
skewness, 88-90 


Measurement (conr.) 

standard deviation, 82-83 
standard scores, 86-83 
the variance, 79-82, 83-85 
Measurement in Today's Schools (Stanley). 70. 
266 

Measuring Intelligence fTetman andMemlD, 104 
Median, 59-60, 65-67 
calculation of, 59-60 
of combined groups, 65-66 
defined, 59 
interpretation of, 67 
Median interval, 60 
Mesokurtrc curves, 91, 98 
Metropolitan Achievement Tests, 399, 453 
Metropolitan Reading Test, 459-60 
Miller Analogies Test, 335 
Million Random Digits v/ith 100,000 Normal 
Dr dates. A, 5 JO-1 2 

Minnesota Teacher Attitude Inventory, 334, 
335 

Minnesota Tests of Creative Thinking, 335 
Mode, 58-59. 65-67 
of combined groups, 65-66 
conventional use of, 58-59 
interpretation of, 67 
major, 59 
minor, 59 

Modern Language Association Foreign 
Language Proficiency Tests, 189 
Moments, 223-24 

Multiple comparison procedures, 381-99 
the 5-method, 388-93 
compared to the T-method, 395-97 
confidence intervals around contrasts by, 
393-95 

the T-method, 383-84 
compared to the S-method, 39S-97 
confidence intervals around contrasts by, 
3g5-88 

Multiple correlation coefficient, 186-91 
Multiple prediction, 186-91 
Multiple-prediction equation, 187 
Multiple-regression equation, 187 
Multiplicative rule of probabilities, 202-3 
Multivariate prediction. 187 

National Industrial Conference Board, 39 
Newenan-Keuls procedure, 382, 388 
Neyman-Pearson hypothesis-testing theory, 

287 

Nominaf-dichofomous measurement, 156 
Nominal measurement, 8 
Noncentral chi-square distribution, 425-26 
Nondirecnonal alternative hypotheses, 288- 
89 

Nonlinear relationships between variables, 
150-52 

Normal curves: 
distribution, 98-104 
family of, 100-101 
kurtosis of, 98 
skewness of, 98 
unit, 98, 100 
uses of, 102-4 

Normal distribution, 95-108, 228-29, 236-38 
bivariate, 105-7, 140. 141, 274 
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Normal distribution (cow.) 
history of, 95-97 
the normal curve, 98-104 
as a standard, 102 
Notation, sigma, 18-24 
Null hypotheses, 284, 341, 502-4 
expected values of mean squares and, 

423 

F-iest of. 153-57, 166 
statement of, 411-14 
in terms of population means, 345 
tests of, 428-31 

Ogive, 45 

One-tailed test, 288-89 
One-tailed Mest, 295 
Operations, symbolizing, 17-18 
Ordinal interaction, 410-11 
Ordinal measurement, 8-|0. 156 

Parameters, 240-42 
Part correlation, 182-86 
Partial correlation, 182-86 
Pascal's triangle, 208-9 
Pearson product-moment correlation coeffi- 
cient, 109-13, 122, 155, 158, 159. 161, 
163, 166, 186 
Percentile curves, 45, 47 

for representing a series of distributions, 
48-50 

Percentile range, 77 
Percentiles, 33 
defined. 34 
determining, 14-38 
Permutations, 203-6 
4>, properties of, 161-62 
Phi coefficient, 158-59, 315 
Picture graphs, 47 
Planning of Experiments (Cox), 508 
PlalyVuitic curves, 91, 98 
Poinl-bisenal correlation coefficient, 163-64, 
318 

Point estimate, 256 
Point estimation, 256 
Polygons: 
frequency, 44 
asymmetrical, 70 

f° r representing a series of distributions, 48, 

Population, 212, 240-42 
bivariate normal, 264 
estimation, 240-42 
homogeneous, 3J3 
mean of, 293-95 
null hypothesis in terms or, 345 
product-moment correlation coefficient, 

proportion, 321-24 
variance of, 30t-l 
Power: 

of the F-iesl. 3J7. 376-77 
hypotheses testing, 283-88 
Prediction, 133-5-4 
multiple, 186-91 
multivariate. || 7 
univariate. 187 
Prediction line, 137 


Probability, 195-227 , 

as an area, 218-20 
binomial distribution, 207-12. 
combinations, 203-6 
combining, 198-203 
first addition rule, 198-99 
multiplicative rule, 202-3 
second addition rule, 200-202 
defined , 197 
density function, 219 
distributions, 218-19 

expectations, 220-23 
as a mathematical system, 196-98 
moments, 223-24 
permutations, 203-6 
" randomness, 212-1 5 
random sampling, 212-15 
random variable, 215-17 
^ypes of, 217-18 
Pro’duct- moment biserial, 163 
Prod uct-momen t correlation coefficient, 308-10 
Psychological Bulletin, 333 
Psychological statistics, 4 

Quantiles, 33 

Quarterly Publications of the American Statis- 
tical Association, 53 
Quartiles, 33 

Quasi-experimentation, 501 

RAND Corporation, 510 
Randomired-block design, 491-92 
Randomness, 212-15 
Random sampling. 212-15, 242-43 
estimation, 242-43 
Random variable, 215-17 
continuous, 217-18 
discrete, 217-18 
types of, 217-18 
Range, 76-78 
exclusive, 76-77 
inclusive, 76-77 
interquartile, 78 
percentile, 77 
semi-interquartile, 77-78 
defined, 78 

Rank-biserial correlation coefficient, 320 
Rank Correlation Methods (Kendall), 540 
Rank order. 26-27 
Ratio, correlation, 150-51 
Ratio measurement, fl-14 
Ratio of means, 72 
Ratio-scale, leveling on, 494-95 
Region of rejection, 281 
Relationships: 
between b, and b,. 144-49 
Unear. 118, 124 

measurement of, 109-32, 155-94 
computational formula for r n , 113-14 
differences of variables, 127-29 
effect on r„oftransformingscores, 119-21 
illustration of the calculation ofr„. 114-1' 
interpretation and use of r„ 175-76 
interpretation of correlation coefficients 
121-27 

interval (ratio), 1 56-57 
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Relationships (com.') 

Kendall's tau, 176-79 
multiple correlation, 186-91 
multiple prediction, 186-9] 
nominal-dichotomous, 156 
ordinal, 8-10, 1S6 
part correlation, 182-86 
partial correlation, 182-86 
Pearson product-moment correlation 
coefficient, 109-13, J 22, 15 5, 158, 159, 
161, 163, 166, 186 
properties of <£, 161-62 
range of values of r^, 117-19 
Spearman rank-correlation coefficient, 
172-75, 176, 316 
tied ranks problem, 175 
variance of sums, 127-29 
nonlinear, between variables, 150-52 
Repeated measures design, 469 
Replication of the experiment. 424 
Residual change score, 182-83 
r„ interpretation and use of, 175-76 

computational formula for, 113-14 
effect of transforming scores, 119-20 
illustration of the calculation of, 114-16 
range of values of, 117-19 

Sample mean, 60 
Samples, 212, 240-42 

dependent, 297-300, 306-7, 313-14, 326-28 
estimation, 240-42 

independent, 295-97, 303-6, 311-13, 324-26 
Sample space, 196-97, 215, 216 
Sample variance, 342 
Sampling: 

distributions. 243-50 
error, 275, 490 
random, 212-15, 242-43 
estimation, 242-43 
Sampling Techniques (Cochran), 241 
Scales of measurement, 8-14 
characteristics, 12 
examples of, 12 
interval, 10-11 
nominal, 8 
ordinal, 8-10, 156 
ratio, 1 1-14 

Scatter diagrams, 110. Ill, 116, 123-24, 125, 
138, 141,247 

Scholastic Aptitude Test, 188 
School Mathematics Study Croup (SMSG), 
487 

Science Curriculum Improvement Study 
(SCIS), 463, 465 
Scientific hypotheses, 272-74 
Score class, 28 

Semi-mterquarule range, 77-78 
defined, 78 
Series, ungrouped, 26 
Sigma notation, 18-24 
Significance: 
level of, 279-83 
testing, 481-82 

Simple random sampling, 212-15 
Simultaneous confidence intervals, 385, 387, 
394 


Skewness, 88-90 
negative, 90 
of the normal curve, 98 
positive, 90 
S-method, 388-93 
compared to the T-method, 395-97 
confidence intervals around contrasts by, 
393-95 

Smooth curves, 44-45 
Sociological statistics, 4 
Spearman's rank-correlation coefficient, 172- 
75, 176, 316 

Standard deviation, 82-83 
Standard error of estimate, 141-44 
Standard error of the correfation coefficient, 
249 

Standard error of the mean, 248 
Standard scores, 86-88 
Stanford Achievement Test, 307-8 
Stanford-Binet Intelligence Test, 51, 103-4, 1 10 
Statistical analysis, units of, 505-9 
Statistical hypotheses, 272-79 
Statistical inference: 
estimation, 240-70 
interval, 256-68 
population, 240-42 
properties of estimators, 250-56 
random sampling, 242-43 
samples, 240-42 

sampling distribution concept, 243-50 
hypothesis testing, 271-91 
critical region, 279-83 
directional alternatives, 288-89 
level of significance, 279-83 
nondirectional alternatives, 288-89 
one-tailed, 288-89 
power, 283-88 
scientific, 272-74 
statistical, 272-79 
two-tailed, 288-89 
type 1 error, 279-83 
type II error. 283-88 
theoretical distributions, 228-39 
Statistical Methods for Research Workers 
(Fisher and Yales). 536 
Statistical Methods in Research (Johnson), 306 
Statistical Tables for Biological, Agricultural 
and Medical Research (Fisher and 
Yates), 520, 521 
Statistics, 1-6 
applied, 4 
defined, 4-5 

descriptive. 2 
economic, 4 
educational. 4 
inferential. 2-3 
mathematical, 4 
psychological, 4 
sociological, 4 
summary. 57 
Statisticulation, 1 


Straight line- 
general equation for , 135 
intercept of, 1 35 
slope of. 135 
Stratifying, 490, 492 
on ordinal-scale, 493-94 
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StratifV ,n S variable, 493-94 
Student’s /-distribution, 234-38, 293-94, 296, 
298, 303, 318 
Summary statistics. 57 
Sams oC square.*, 343-45, S64-C5 
computing, 476-79 
bcwmctt groups, 345 
with groups, 345 
total. 343-45 

in the two-factor ANOVA, 414-16 
Systematic biases. 214 

Tables. SV3-S4 
the form of, 31-32 
Tau, 176-79, 116-37 

(-distributions, 234-38, 293-94. 296. 298, 303. 
318 

“Tests of Significance” (Kruskal), 290 
Teiracbonc correlation coefficient, t66, 318-19 
T-method, 383-84 
compared to the S- method, 395-97 
confidence intervals around contrasts by, 
385-88 

Transformation: 

Fischer’s Z-. 265-68, 308-10, 311 
linear, 119-20 
Tufcey method, 388, 397-98 
Two-tailed test, 288-89 
Two-tailed t-tesl, 295 
Typewriter graphs, 46. 47 

Ungrouped senes, 26 
Unit normal curves, 98, 100 
Unit normal deviation, 102 
Units or Statistical analysis, S05-9 
Univariate prediction, 187 
Unweighted means analysis. 44Q 
Uses of Things test, 273 

Value, absolute, 83 

Variability, measurement of, 14-16,75-94 
kurtosis, 90-92 
mean deviation, 79, 85-86 
the range, 76-78 
skewness. 88-90 
standard deviation. 82-83 
standard scores. 86-88 
the variance. 79-82, 83-83 
Variables: 

dependent, 136 
differences of, 127-29 
exact value ot, 55-16 
independent, 136 
levtling, 494-93 

nonlinear relationships between. 150-52 
random, 755-51 
continuous, 217-18 
discrete, 117-58 
types of. 217-18 
reported value of. 55-56 
tl ratifying, 491-94 
Variance, 79-82, 81-85, 452-85 
analysis of. are ANOVA 
calculation of, 81-82 
computing sums of squares. 476-79 
definitions of terms, 473-74 
degre« of freedom for $< 


Variance (cont.) 

expectations of mean squares, . 

heterogeneous, 371-72 
homogeneous, 295, 372-74 
testing, 374-76 

lints of ANOVA table, 474-75 
mixed -effects, 463-71 
model for the data, 402-3 

one- factor analysis of, 338-8Q 

degrees of freedom, 345-47, 365 

distribution theory. 350-53 

estimates or terms in the model, 3 «-4J 
expectations of MS t and M S„, 347-30 
failure to meet assumptions of, 368-/4 
F-lest of the null hypothesis, 353-57, 
366 

layout of data, 338-39 
mean squares, 347, 365 
model for the data, 339-42 
with n observations per cell, 357-62 
'mull hypothesis in terms of population 
means, 345 

power of the f- test, 376-77 
sums of squares, 343-45, 364-65 
testing homogeneity, 374-76 
wilh unequal n’s, 362-68 
of a population, 301-3 
properties of, 83-85 
random-effects, 452-62 
sample, 342 

significance testing, 481-82 
of sums, 127-29 
two-factor analysis of, 400-451 
computational procedures, 418-20 
degrees of freedom, 416-18 
distribution of mean squares, 424-28 
with equal numbers ot observation per 
cell, 431-32 

expected values of mean squares, 421-24 
fixed-effects with unequal n's, 432-43 
hypothesis tests of the null hypotheses, 
428-31 

layout of data, 400-402 
least-squares estimation of the model, 
403-6 

mean squares, 418 
model for data, 319-42 
multiple comparisons in, 443-45 
nature of interaction. 406-1 1 
sources of variation. 414 
statement of null hypotheses, 4tl-14 
sums of squares in, 414-16 
symbolization of data, 400-402 
writing the ANOVA (able, 471-72 
Variance error. 254-56 
of the mean, 248 
Venn diagrams. 200, 201 

Wechsler Adult Intelligence Scale, 150 
Wechsler Intelligence Scale, 249 
Wech s' e t l ntc 1 U gen ce Scale for Children 
(WlSC). 260-61, 483-84 
Weight, 125 

W,d c -Range Vocabulary Test. 335 


>r sources of, 475-76 


Z-transformation, fiber’s. 265-68, 308-10, 



